AWS ECS Task Keeps Stopping — How to Fix It (2026)

Your ECS task starts and then immediately stops or keeps restarting. Here's every reason this happens and how to debug and fix it.

You deploy to ECS and your task immediately goes into STOPPED state. Or it runs for 30 seconds and then dies. The service keeps trying to restart it, hitting the circuit breaker, and eventually gives up.

Here's how to find and fix the root cause.

Step 1: Find the Stop Reason

The most important first step — check why the task stopped:

bash

# AWS CLI
aws ecs describe-tasks \
  --cluster your-cluster \
  --tasks <task-arn> \
  --query 'tasks[0].{status:lastStatus,stopCode:stopCode,reason:stoppedReason}'

Or in the console: ECS → Cluster → Service → Tasks → Stopped tab → click the task → check Stopped reason.

Common stop reasons:

Essential container in task exited — your app container crashed
Task failed ELB health checks — health check failing
CannotPullContainerError — image pull failed
OutOfMemoryError: Container killed — OOM

Cause 1: Application Crash on Startup

Most common cause. The container starts but the app crashes immediately.

Check container logs:

bash

# If using CloudWatch Logs
aws logs get-log-events \
  --log-group-name /ecs/your-service \
  --log-stream-name <log-stream-id> \
  --limit 50
 
# Or in console: ECS → Task → Logs tab

Look for: stack traces, missing environment variables, database connection failures, port conflicts.

Fix: Reproduce locally first:

bash

docker run --rm \
  -e DATABASE_URL=your-db-url \
  -e PORT=8080 \
  your-image:tag

If it crashes locally too — fix the app, not ECS.

Cause 2: Missing or Wrong Environment Variables

App starts fine locally but fails in ECS because a required env var is missing.

bash

# Check task definition env vars
aws ecs describe-task-definition \
  --task-definition your-task-def \
  --query 'taskDefinition.containerDefinitions[0].environment'

Fix: Add the missing variables in your task definition. For secrets, use AWS Secrets Manager:

json

"secrets": [
  {
    "name": "DATABASE_PASSWORD",
    "valueFrom": "arn:aws:secretsmanager:us-east-1:123:secret:prod/db-password"
  }
]

Cause 3: Health Check Failing

ECS or your load balancer is killing the task because it's not passing health checks.

bash

# Check target group health
aws elbv2 describe-target-health \
  --target-group-arn your-tg-arn

Fix the ECS health check (in task definition):

json

"healthCheck": {
  "command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
  "interval": 30,
  "timeout": 10,
  "retries": 3,
  "startPeriod": 60
}

startPeriod is critical — give slow-starting apps time before health checks begin. Default is 0 which kills apps that take > 10s to start.

Cause 4: OOM — Container Killed

The task is using more memory than its limit.

bash

aws ecs describe-tasks \
  --cluster your-cluster \
  --tasks <task-arn> \
  --query 'tasks[0].stoppedReason'
# "OutOfMemoryError: Container killed"

Fix: Increase memory in task definition or reduce app memory usage:

json

{
  "memory": 1024,
  "memoryReservation": 512
}

Add CloudWatch Container Insights to track memory usage over time.

Cause 5: Image Pull Failure

ECS can't pull the Docker image from ECR or Docker Hub.

bash

# Check task stopped reason
"CannotPullContainerError: ref pull has been retried 5 time(s)"

Common causes:

ECR repository doesn't exist
Task IAM role doesn't have ECR pull permissions
Image tag doesn't exist
ECR login expired (for cross-account)

Fix for ECR permissions — add to your task execution role:

json

{
  "Effect": "Allow",
  "Action": [
    "ecr:GetAuthorizationToken",
    "ecr:BatchCheckLayerAvailability",
    "ecr:GetDownloadUrlForLayer",
    "ecr:BatchGetImage"
  ],
  "Resource": "*"
}

Cause 6: Port Conflict

If you're using host network mode, another task might already be using the port.

Fix: Use awsvpc network mode (recommended for Fargate and ECS on EC2):

json

"networkMode": "awsvpc"

Each task gets its own ENI and private IP. No port conflicts possible.

Enable CloudWatch Container Insights

Turn this on — it gives you CPU, memory, network metrics per task:

bash

aws ecs update-cluster-settings \
  --cluster your-cluster \
  --settings name=containerInsights,value=enabled

Quick Debug Checklist

Symptom	Check	Fix
Immediate exit	Container logs	Fix app crash
Missing env var	Task def env section	Add variable
Health check fail	Target group health	Fix endpoint + startPeriod
OOM killed	Stopped reason	Increase memory limit
Can't pull image	ECR permissions	Fix task execution role IAM
Port in use	Network mode	Switch to awsvpc

AWS ECS Task Keeps Stopping — How to Fix It (2026)

Step 1: Find the Stop Reason

Cause 1: Application Crash on Startup

Cause 2: Missing or Wrong Environment Variables

Cause 3: Health Check Failing

Cause 4: OOM — Container Killed

Cause 5: Image Pull Failure

Cause 6: Port Conflict

Enable CloudWatch Container Insights

Quick Debug Checklist

Stay ahead of the curve

Related Articles

AWS ECR Image Push Access Denied — Every Fix (2026)

AWS ALB 504 Gateway Timeout — Every Cause and Fix (2026)

AWS ALB Target Group Unhealthy — Every Cause and Fix

Comments