🎉 DevOps Interview Prep Bundle is live — 1000+ Q&A across 20 topicsGet it →
All Articles

AWS CloudFormation Stack Stuck in ROLLBACK_FAILED: Fix It Now

CloudFormation stack stuck in ROLLBACK_FAILED or UPDATE_ROLLBACK_FAILED state? Here's every cause and the exact steps to recover without losing your resources.

DevOpsBoysMay 12, 20264 min read
Share:Tweet

A CloudFormation stack in ROLLBACK_FAILED is one of the most frustrating AWS situations. You can't update it. You can't delete it normally. You can't redeploy over it. And the AWS console often gives you a vague error message that doesn't help.

Here's exactly how to diagnose and fix it.


Understanding the States

CREATE attempt fails     → CREATE_FAILED       → stack is deleted automatically
UPDATE attempt fails     → UPDATE_ROLLBACK_IN_PROGRESS
                           → if rollback succeeds: UPDATE_ROLLBACK_COMPLETE (safe)
                           → if rollback fails:   UPDATE_ROLLBACK_FAILED   (stuck)
DELETE attempt fails     → DELETE_FAILED        (stuck)

UPDATE_ROLLBACK_FAILED is the most common stuck state. It means: your update failed, AND the attempt to revert to the previous state also failed.


Step 1 — Find the Root Cause

In the AWS Console → CloudFormation → your stack → Events tab.

Filter by Status Reason and look for the red FAILED events. Read from bottom to top — the earliest failure is the root cause.

bash
# Via CLI
aws cloudformation describe-stack-events \
  --stack-name my-stack \
  --query 'StackEvents[?ResourceStatus==`UPDATE_ROLLBACK_FAILED`].[LogicalResourceId,ResourceStatusReason]' \
  --output table

Common error messages and what they mean:

ErrorMeaning
Resource is not in the state rollbackResource was manually modified outside CloudFormation
DELETE_FAILED: DependencyViolationResource has dependencies that must be deleted first
The following resource(s) failed to rollback: [MyBucket]S3 bucket not empty, or resource was deleted manually
Limit exceededService quota hit during rollback

Step 2 — Continue Update Rollback (Skip Failing Resources)

AWS provides a way to skip specific resources during rollback. This is safe when the resource was manually deleted or modified outside CloudFormation.

bash
aws cloudformation continue-update-rollback \
  --stack-name my-stack \
  --resources-to-skip LogicalResourceId1 LogicalResourceId2

Example — Skip a manually deleted S3 bucket:

bash
aws cloudformation continue-update-rollback \
  --stack-name my-stack \
  --resources-to-skip MyS3Bucket

This tells CloudFormation: "Skip this resource during rollback — assume it's fine."

After this succeeds, your stack goes to UPDATE_ROLLBACK_COMPLETE. You can then make a new update to fix the skipped resource.


Cause 1: Resource Manually Modified Outside CloudFormation

Someone went into the console and changed a security group, deleted a resource, or added a tag manually. Now CloudFormation can't revert it.

Diagnose:

bash
aws cloudformation describe-stack-resource \
  --stack-name my-stack \
  --logical-resource-id MySecurityGroup \
  --query 'StackResourceDetail.{Status:ResourceStatus,Reason:ResourceStatusReason}'

Fix: Either skip the resource in continue-update-rollback, or manually revert the resource to its previous state before retrying:

bash
# Option 1: Skip the resource
aws cloudformation continue-update-rollback \
  --stack-name my-stack \
  --resources-to-skip MySecurityGroup
 
# Option 2: Import the manually-changed resource back into CloudFormation
aws cloudformation create-change-set \
  --stack-name my-stack \
  --change-set-name import-fix \
  --change-set-type IMPORT \
  --resources-to-import '[{"ResourceType":"AWS::EC2::SecurityGroup","LogicalResourceId":"MySecurityGroup","ResourceIdentifier":{"GroupId":"sg-abc123"}}]' \
  --template-body file://template.yaml

Cause 2: S3 Bucket Not Empty

CloudFormation can't delete a non-empty S3 bucket. If your stack update involves deleting a bucket, and the bucket has objects, rollback fails.

Fix — Empty the bucket first:

bash
# List and delete all versions
aws s3api list-object-versions --bucket my-bucket \
  --query '{Objects: Versions[].{Key:Key,VersionId:VersionId}, Quiet: true}' \
  --output json > delete.json
 
aws s3api delete-objects --bucket my-bucket --delete file://delete.json
 
# Delete delete markers
aws s3api list-object-versions --bucket my-bucket \
  --query '{Objects: DeleteMarkers[].{Key:Key,VersionId:VersionId}, Quiet: true}' \
  --output json > deletemarkers.json
 
aws s3api delete-objects --bucket my-bucket --delete file://deletemarkers.json
 
# Now retry rollback or delete
aws cloudformation continue-update-rollback --stack-name my-stack

Cause 3: Stack in DELETE_FAILED State

You tried to delete the stack and it failed. Common reasons:

  • Non-empty S3 bucket
  • RDS with deletion protection enabled
  • VPC with attached resources (IGW, subnets with ENIs)

Fix — Delete with resource retention:

bash
# Delete stack but keep specific resources
aws cloudformation delete-stack \
  --stack-name my-stack \
  --retain-resources MyS3Bucket MyRDSInstance

These resources stay in your account but are removed from the stack. Clean them up manually.

Fix — Disable deletion protection first:

bash
# For RDS
aws rds modify-db-instance \
  --db-instance-identifier my-db \
  --no-deletion-protection
 
# Then retry stack deletion
aws cloudformation delete-stack --stack-name my-stack

Cause 4: Service Quota Exceeded During Rollback

Rollback tried to re-create resources but hit an AWS service limit.

Diagnose:

bash
aws cloudformation describe-stack-events \
  --stack-name my-stack \
  --query 'StackEvents[?contains(ResourceStatusReason, `limit`) || contains(ResourceStatusReason, `Limit`)].[LogicalResourceId,ResourceStatusReason]' \
  --output table

Fix:

  1. Request a limit increase via Service Quotas console
  2. Or skip the resource and add it back manually:
bash
aws cloudformation continue-update-rollback \
  --stack-name my-stack \
  --resources-to-skip MyResource

Nuclear Option: Force Delete a Stuck Stack

If nothing works and you need to remove the stack:

bash
# AWS CLI v2 supports --deletion-mode
aws cloudformation delete-stack \
  --stack-name my-stack \
  --deletion-mode FORCE_DELETE_STACK

Warning: This force-deletes the CloudFormation stack record but does NOT delete the underlying AWS resources. You'll need to clean those up manually.


Prevention Checklist

  1. Never modify CloudFormation-managed resources manually — always update via template
  2. Enable termination protection on production stacks:
    bash
    aws cloudformation update-termination-protection \
      --enable-termination-protection \
      --stack-name my-stack
  3. Use DeletionPolicy: Retain for stateful resources (S3, RDS, DynamoDB):
    yaml
    MyBucket:
      Type: AWS::S3::Bucket
      DeletionPolicy: Retain
      UpdateReplacePolicy: Retain
  4. Test updates in a staging stack first — never test risky changes in prod

CloudFormation stuck states are recoverable in 99% of cases. The key is reading the event log carefully to find the actual failing resource, then skipping or fixing it before retrying rollback.

For AWS troubleshooting labs and certification prep, KodeKloud has hands-on AWS courses with real AWS environments.

Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments