AWS ECS Task Keeps Stopping — How to Fix It (2026)
Your ECS task starts and then immediately stops or keeps restarting. Here's every reason this happens and how to debug and fix it.
You deploy to ECS and your task immediately goes into STOPPED state. Or it runs for 30 seconds and then dies. The service keeps trying to restart it, hitting the circuit breaker, and eventually gives up.
Here's how to find and fix the root cause.
Step 1: Find the Stop Reason
The most important first step — check why the task stopped:
# AWS CLI
aws ecs describe-tasks \
--cluster your-cluster \
--tasks <task-arn> \
--query 'tasks[0].{status:lastStatus,stopCode:stopCode,reason:stoppedReason}'Or in the console: ECS → Cluster → Service → Tasks → Stopped tab → click the task → check Stopped reason.
Common stop reasons:
Essential container in task exited— your app container crashedTask failed ELB health checks— health check failingCannotPullContainerError— image pull failedOutOfMemoryError: Container killed— OOM
Cause 1: Application Crash on Startup
Most common cause. The container starts but the app crashes immediately.
Check container logs:
# If using CloudWatch Logs
aws logs get-log-events \
--log-group-name /ecs/your-service \
--log-stream-name <log-stream-id> \
--limit 50
# Or in console: ECS → Task → Logs tabLook for: stack traces, missing environment variables, database connection failures, port conflicts.
Fix: Reproduce locally first:
docker run --rm \
-e DATABASE_URL=your-db-url \
-e PORT=8080 \
your-image:tagIf it crashes locally too — fix the app, not ECS.
Cause 2: Missing or Wrong Environment Variables
App starts fine locally but fails in ECS because a required env var is missing.
# Check task definition env vars
aws ecs describe-task-definition \
--task-definition your-task-def \
--query 'taskDefinition.containerDefinitions[0].environment'Fix: Add the missing variables in your task definition. For secrets, use AWS Secrets Manager:
"secrets": [
{
"name": "DATABASE_PASSWORD",
"valueFrom": "arn:aws:secretsmanager:us-east-1:123:secret:prod/db-password"
}
]Cause 3: Health Check Failing
ECS or your load balancer is killing the task because it's not passing health checks.
# Check target group health
aws elbv2 describe-target-health \
--target-group-arn your-tg-arnFix the ECS health check (in task definition):
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
"interval": 30,
"timeout": 10,
"retries": 3,
"startPeriod": 60
}startPeriod is critical — give slow-starting apps time before health checks begin. Default is 0 which kills apps that take > 10s to start.
Cause 4: OOM — Container Killed
The task is using more memory than its limit.
aws ecs describe-tasks \
--cluster your-cluster \
--tasks <task-arn> \
--query 'tasks[0].stoppedReason'
# "OutOfMemoryError: Container killed"Fix: Increase memory in task definition or reduce app memory usage:
{
"memory": 1024,
"memoryReservation": 512
}Add CloudWatch Container Insights to track memory usage over time.
Cause 5: Image Pull Failure
ECS can't pull the Docker image from ECR or Docker Hub.
# Check task stopped reason
"CannotPullContainerError: ref pull has been retried 5 time(s)"Common causes:
- ECR repository doesn't exist
- Task IAM role doesn't have ECR pull permissions
- Image tag doesn't exist
- ECR login expired (for cross-account)
Fix for ECR permissions — add to your task execution role:
{
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage"
],
"Resource": "*"
}Cause 6: Port Conflict
If you're using host network mode, another task might already be using the port.
Fix: Use awsvpc network mode (recommended for Fargate and ECS on EC2):
"networkMode": "awsvpc"Each task gets its own ENI and private IP. No port conflicts possible.
Enable CloudWatch Container Insights
Turn this on — it gives you CPU, memory, network metrics per task:
aws ecs update-cluster-settings \
--cluster your-cluster \
--settings name=containerInsights,value=enabledQuick Debug Checklist
| Symptom | Check | Fix |
|---|---|---|
| Immediate exit | Container logs | Fix app crash |
| Missing env var | Task def env section | Add variable |
| Health check fail | Target group health | Fix endpoint + startPeriod |
| OOM killed | Stopped reason | Increase memory limit |
| Can't pull image | ECR permissions | Fix task execution role IAM |
| Port in use | Network mode | Switch to awsvpc |
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
AWS ALB Showing Unhealthy Targets — How to Fix It
Fix AWS Application Load Balancer unhealthy targets. Covers health check misconfigurations, security group issues, target group problems, and EKS-specific ALB controller debugging.
AWS EKS Pods Stuck in Pending State: Causes and Fixes
Pods stuck in Pending on EKS are caused by a handful of known issues — insufficient node capacity, taint mismatches, PVC problems, and more. Here's how to diagnose and fix each one.
AWS IAM Permission Denied Errors — How to Fix Every Variant (2026)
Getting 'Access Denied' or 'is not authorized to perform' errors in AWS? Here's how to diagnose and fix every IAM permission issue — EC2, EKS, Lambda, S3, and CLI.