The Problem
Tried to upgrade a Helm release and it got stuck. Every subsequent helm upgrade or helm rollback returned:
Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress
The release was in pending-upgrade state and nothing would clear it.
What Happened
The previous upgrade had been interrupted (pod got OOMKilled mid-deployment, Helm never got to mark the release as failed or deployed). The release was left in a zombie pending-upgrade state.
The Fix
# 1. Check the release status
helm history my-release -n production
# 2. Find the last successful revision
# REVISION STATUS DESCRIPTION
# 5 deployed Upgrade complete
# 6 pending-upgrade Interrupted!
# 3. Rollback to last good revision
helm rollback my-release 5 -n production
# If rollback also fails, use force flag
helm rollback my-release 5 -n production --force
# Nuclear option: update the secret directly
# (only if rollback truly won't work)
kubectl get secret -n production | grep sh.helm
# sh.helm.release.v1.my-release.v6
kubectl delete secret sh.helm.release.v1.my-release.v6 -n production
# Then re-run your upgradeRollback to revision 5 worked immediately. The release was back to deployed.
Root Cause
No --timeout set on the upgrade. When the OOMKill happened, Helm had no timeout to trigger failure handling. Set a timeout in CI to prevent this:
helm upgrade my-release ./chart \
--timeout 5m \
--atomic \ # auto-rollback on failure
--wait--atomic is the key flag — it auto-rolls back if the upgrade fails.