All Articles

Why Your Docker Container Keeps Restarting (and How to Fix It)

CrashLoopBackOff, OOMKilled, exit code 1, exit code 137 — Docker containers restart for specific, diagnosable reasons. Here is how to identify the exact cause and fix it in minutes.

DevOpsBoysMar 11, 20266 min read
Share:Tweet

You deploy your container. It starts. Then it restarts. Then it restarts again. If you've been in DevOps for more than a week, you've seen this. The frustrating part isn't that containers crash — it's that the error messages feel cryptic until you know what they actually mean.

The good news: every restart has a reason, and every reason is diagnosable. This guide breaks down the most common causes, shows you exactly how to find which one is hitting you, and gives you the precise fix for each.


Why Containers Restart in the First Place

Docker containers are designed to run a single process. When that process exits — for any reason — the container stops. If your restart policy is set to always or on-failure, Docker immediately tries again. If the process keeps failing, you get an infinite restart loop.

In Kubernetes, this surfaces as CrashLoopBackOff, which adds exponential backoff delays between restarts (starting at 10 seconds, doubling each time up to 5 minutes). It's Kubernetes protecting the cluster from a runaway process hammering resources.

Understanding the exit code is your first diagnostic step. That number tells you a lot.


Exit Code Cheat Sheet

Exit CodeMeaning
0Clean exit — process finished successfully (should not restart unless restart policy forces it)
1General application error — your app crashed
137OOMKilled — container was killed by the OS (out of memory)
139Segmentation fault
143SIGTERM received — graceful shutdown requested but timed out

Check the exit code with:

bash
docker inspect <container_id> --format='{{.State.ExitCode}}'

Cause 1: Application Crash (Exit Code 1)

This is the most common cause. Your application threw an unhandled exception, couldn't find a required file, or encountered a startup error.

How to diagnose:

bash
# For Docker
docker logs <container_name>
 
# For Kubernetes — check current logs
kubectl logs <pod-name> -n <namespace>
 
# CRITICAL: Check logs from the PREVIOUS crashed container
kubectl logs <pod-name> -n <namespace> --previous

The --previous flag is the one people forget. When a pod is in CrashLoopBackOff, kubectl logs shows the current (empty) container. You need --previous to see what actually went wrong.

What to look for: Stack traces, Error: Cannot find module, Connection refused, ENOENT, database connection failures.

Fix: Read the error, fix the code or configuration. Most of the time it's a missing env var or a failed dependency connection on startup.


Cause 2: Missing Environment Variables

Your app expects DATABASE_URL or API_KEY. You forgot to set it. The app throws an error on boot and exits with code 1.

How to diagnose:

bash
# See what env vars the running container has
docker exec <container> env
 
# Or inspect
docker inspect <container> --format='{{.Config.Env}}'

In Kubernetes:

bash
kubectl describe pod <pod-name> -n <namespace>

Look at the Environment: section in the output.

Fix: Add the missing variable to your docker run -e flags, docker-compose.yml env section, or Kubernetes env / envFrom fields in your deployment manifest.

yaml
# Kubernetes example
env:
  - name: DATABASE_URL
    valueFrom:
      secretKeyRef:
        name: app-secrets
        key: database-url

Cause 3: OOMKilled — Exit Code 137

Your container ran out of memory. The Linux kernel OOM killer stepped in and killed the process with SIGKILL. This is exit code 137 (128 + 9).

This is sneaky because the container looks like it just crashed — the logs often show nothing unusual right before the kill.

How to diagnose:

bash
# Docker — look for OOMKilled: true
docker inspect <container> --format='{{.State.OOMKilled}}'
 
# Kubernetes
kubectl describe pod <pod-name> -n <namespace>

In the kubectl describe output, look for:

Last State: Terminated
  Reason: OOMKilled
  Exit Code: 137

Fix: You have two options. First, increase the memory limit in your container spec. Second (and better long-term), profile your application to find the memory leak.

yaml
# Kubernetes resource limits
resources:
  requests:
    memory: "256Mi"
  limits:
    memory: "512Mi"

Start with a reasonable limit and monitor actual usage with:

bash
kubectl top pod <pod-name> -n <namespace>

Cause 4: Wrong Entrypoint or CMD

Your Dockerfile has a CMD that points to a script that doesn't exist, has wrong permissions, or has Windows-style line endings (\r\n) that Linux can't execute.

How to diagnose:

bash
# Run the container interactively to test
docker run -it --entrypoint /bin/sh <image-name>
 
# Then manually run your entrypoint script
/app/start.sh

This immediately reveals permission errors or "file not found" issues.

Common culprits:

bash
# Wrong line endings (Windows developers, this is usually you)
file start.sh
# Output: start.sh: ASCII text, with CRLF line terminators ← problem
 
# Fix line endings
sed -i 's/\r//' start.sh
 
# Wrong permissions
chmod +x start.sh

Cause 5: Health Check Failures

If you've configured a health check and it consistently fails, Docker will mark the container as unhealthy. In Kubernetes, a failing livenessProbe triggers automatic restarts.

How to diagnose:

bash
# Docker health status
docker inspect <container> --format='{{.State.Health.Status}}'
docker inspect <container> --format='{{json .State.Health.Log}}' | jq

In Kubernetes:

bash
kubectl describe pod <pod-name>

Look for events like:

Liveness probe failed: HTTP probe failed with statuscode: 503

Fix: Test your health check endpoint manually first:

bash
curl -v http://localhost:8080/health

If the endpoint isn't ready when the probe fires, increase initialDelaySeconds:

yaml
livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 30    # Give app time to boot
  periodSeconds: 10
  failureThreshold: 3

Cause 6: CrashLoopBackOff in Kubernetes

CrashLoopBackOff isn't a cause — it's a symptom. Kubernetes applies it when a container keeps failing. But you can get more detail:

bash
# Full event history for the pod
kubectl describe pod <pod-name> -n <namespace>
 
# Events at the namespace level
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
 
# Check which container in a multi-container pod is crashing
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.status.containerStatuses[*].name}'

The Events: section at the bottom of kubectl describe is gold. It tells you why the container is being killed.


Debugging Checklist

When a container keeps restarting, run through this in order:

  1. Get the exit codedocker inspect or kubectl describe
  2. Read the previous logskubectl logs --previous or docker logs
  3. Check for OOMKilleddocker inspect for OOMKilled: true
  4. Verify environment variables — are all required vars present?
  5. Test the entrypoint manuallydocker run -it --entrypoint /bin/sh
  6. Check health probe timing — increase initialDelaySeconds if app is slow to boot
  7. Check resource limits — are CPU/memory limits too aggressive?
  8. Look at Kubernetes eventskubectl get events --sort-by='.lastTimestamp'

Most container restarts are solved in steps 1–4. The rest cover the edge cases.


Keep Learning

Debugging containers is a core skill that separates junior engineers from senior ones. If you want to go deeper on Docker, Kubernetes troubleshooting, and production-grade container workflows, KodeKloud has hands-on labs that let you break things in a safe environment and practice fixing them — exactly the kind of muscle memory that makes these diagnoses second nature.

The next time a container restarts, you won't panic. You'll reach for kubectl logs --previous and know exactly what you're looking for.

Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments