All Articles

Kubernetes ImagePullBackOff: Every Cause and Fix Explained

ImagePullBackOff is one of the most common Kubernetes errors. This guide covers every root cause — wrong image names, missing auth, network issues, rate limits — with step-by-step debugging and fixes.

DevOpsBoysMar 17, 20267 min read
Share:Tweet

Your pod is stuck in ImagePullBackOff. The container never starts. You check kubectl get pods and see the dreaded status sitting there, not changing.

This is one of the most frequent Kubernetes errors, especially for teams working with private registries or deploying to new clusters. The frustrating part is that Kubernetes doesn't always tell you why the image pull failed — it just backs off and retries.

This guide covers every cause I've seen in production and how to fix each one.


What ImagePullBackOff Actually Means

When Kubernetes tries to start a container, the kubelet on the node pulls the container image from a registry. If this pull fails, you get ErrImagePull. Kubernetes then applies exponential backoff — waiting 10s, 20s, 40s, up to 5 minutes between retries. During this backoff period, the status shows ImagePullBackOff.

The key insight: ImagePullBackOff is not the error itself. It means "I tried to pull the image, it failed, and I'm waiting before trying again." The real error is in the events.


Step 1: Get the Real Error Message

Always start here:

bash
kubectl describe pod <pod-name> -n <namespace>

Scroll to the Events section at the bottom. You'll see something like:

Events:
  Type     Reason     Age   From               Message
  ----     ------     ----  ----               -------
  Normal   Scheduled  2m    default-scheduler  Successfully assigned...
  Normal   Pulling    90s   kubelet            Pulling image "myapp:latest"
  Warning  Failed     88s   kubelet            Failed to pull image "myapp:latest": ...
  Warning  Failed     88s   kubelet            Error: ErrImagePull
  Normal   BackOff    75s   kubelet            Back-off pulling image "myapp:latest"
  Warning  Failed     75s   kubelet            Error: ImagePullBackOff

The line that says Failed to pull image contains the actual reason. Read it carefully — it tells you exactly which cause below applies.


Cause 1: Wrong Image Name or Tag

This is the most common cause by far. The image reference in your pod spec doesn't match what exists in the registry.

What the error looks like:

Failed to pull image "myapp:v2.1": rpc error: code = NotFound
  desc = failed to pull and unpack image: failed to resolve reference:
  not found

Common mistakes:

  • Typo in the image name (ngingx instead of nginx)
  • Wrong tag (v2.1 when only v2.0 exists)
  • Missing registry prefix (using myapp:latest when the image is in a private registry like registry.example.com/myapp:latest)
  • Using latest tag when no image is tagged as latest

How to fix:

First, verify the image exists:

bash
# For Docker Hub
docker manifest inspect nginx:1.25
 
# For a private registry
docker manifest inspect registry.example.com/myapp:v2.1
 
# For ECR
aws ecr describe-images --repository-name myapp --image-ids imageTag=v2.1
 
# For GCR/Artifact Registry
gcloud artifacts docker images list us-docker.pkg.dev/project/repo/myapp --include-tags

Then update your deployment:

yaml
spec:
  containers:
    - name: myapp
      image: registry.example.com/myapp:v2.1  # Full path with correct tag

Best practice: Use image digests instead of tags for production. Tags are mutable — someone can push a different image to the same tag. Digests are immutable:

yaml
image: registry.example.com/myapp@sha256:abc123def456...

Cause 2: Missing Image Pull Secret

Your cluster doesn't have credentials to pull from a private registry.

What the error looks like:

Failed to pull image "registry.example.com/myapp:v1":
  rpc error: code = Unknown desc = failed to pull and unpack image:
  failed to resolve reference: pulling from host registry.example.com
  failed with status 401: UNAUTHORIZED

How to fix:

Create an imagePullSecret and reference it in your pod spec:

bash
# Create the secret
kubectl create secret docker-registry regcred \
  --docker-server=registry.example.com \
  --docker-username=myuser \
  --docker-password=mypassword \
  --docker-email=me@example.com \
  -n <namespace>

Then add it to your deployment:

yaml
spec:
  imagePullSecrets:
    - name: regcred
  containers:
    - name: myapp
      image: registry.example.com/myapp:v1

For ECR specifically, the token expires every 12 hours. You need a CronJob or a tool like ECR Credential Helper to refresh it:

bash
# Manual refresh (temporary)
TOKEN=$(aws ecr get-login-password --region us-east-1)
kubectl create secret docker-registry ecr-cred \
  --docker-server=123456789.dkr.ecr.us-east-1.amazonaws.com \
  --docker-username=AWS \
  --docker-password=$TOKEN \
  -n default --dry-run=client -o yaml | kubectl apply -f -

Pro tip: Attach the secret to the default ServiceAccount so every pod in the namespace uses it automatically:

bash
kubectl patch serviceaccount default -n <namespace> \
  -p '{"imagePullSecrets": [{"name": "regcred"}]}'

Cause 3: Docker Hub Rate Limits

Docker Hub enforces pull rate limits: 100 pulls per 6 hours for anonymous users, 200 for authenticated free accounts.

What the error looks like:

Failed to pull image "nginx:1.25":
  rpc error: code = Unknown desc = failed to pull and unpack image:
  toomanyrequests: You have reached your pull rate limit.

How to fix:

Option 1 — Authenticate to Docker Hub (increases your limit):

bash
kubectl create secret docker-registry dockerhub-cred \
  --docker-server=https://index.docker.io/v1/ \
  --docker-username=yourusername \
  --docker-password=your-access-token \
  -n <namespace>

Option 2 — Use a pull-through cache (best for production):

bash
# If using Harbor as a registry proxy
# Configure Harbor to proxy Docker Hub
# Then use: harbor.internal/dockerhub-proxy/library/nginx:1.25

Option 3 — Mirror popular images to your private registry:

bash
# Pull and re-tag to ECR/GCR
docker pull nginx:1.25
docker tag nginx:1.25 123456789.dkr.ecr.us-east-1.amazonaws.com/nginx:1.25
docker push 123456789.dkr.ecr.us-east-1.amazonaws.com/nginx:1.25

Cause 4: Network or Firewall Issues

The node can't reach the container registry. Common in private clusters or air-gapped environments.

What the error looks like:

Failed to pull image "gcr.io/myproject/myapp:v1":
  rpc error: code = Unknown desc = failed to pull and unpack image:
  failed to do request: Head https://gcr.io/v2/myproject/myapp/manifests/v1:
  dial tcp: i/o timeout

How to debug:

SSH into the node (or use a debug pod) and test connectivity:

bash
# Test from a debug pod on the same node
kubectl run debug --image=busybox --restart=Never -- \
  wget -q --spider https://registry.example.com/v2/
kubectl logs debug
 
# Check DNS resolution
kubectl run dns-test --image=busybox --restart=Never -- \
  nslookup registry.example.com

Common fixes:

  • Allow outbound traffic to the registry IP/domain on port 443 in your firewall rules
  • For EKS private clusters, add a VPC endpoint for ECR:
    • com.amazonaws.region.ecr.api
    • com.amazonaws.region.ecr.dkr
    • com.amazonaws.region.s3 (ECR stores layers in S3)
  • For GKE private clusters, enable Cloud NAT or use Artifact Registry VPC Service Controls
  • Check if your corporate proxy is blocking registry traffic

Cause 5: Image Platform Mismatch

You built an ARM image but your nodes are AMD64, or vice versa.

What the error looks like:

Failed to pull image "myapp:latest":
  rpc error: code = NotFound desc = failed to pull and unpack image:
  no match for platform in manifest

How to fix:

Build multi-architecture images:

bash
# Using Docker Buildx
docker buildx create --use
docker buildx build --platform linux/amd64,linux/arm64 \
  -t registry.example.com/myapp:v1 --push .

Or specify the platform in your Dockerfile:

dockerfile
FROM --platform=linux/amd64 node:20-alpine

Cause 6: Image Doesn't Exist in the Registry

The repository exists but the specific tag you're referencing doesn't.

Quick verification:

bash
# List all tags for an image
# Docker Hub
curl -s "https://hub.docker.com/v2/repositories/library/nginx/tags?page_size=100" | jq '.results[].name'
 
# ECR
aws ecr list-images --repository-name myapp --query 'imageIds[].imageTag'
 
# GCR
gcloud container images list-tags gcr.io/myproject/myapp

Debugging Cheat Sheet

Run these commands in order when you see ImagePullBackOff:

bash
# 1. Get the exact error
kubectl describe pod <pod-name> -n <namespace> | grep -A5 "Events"
 
# 2. Verify the image exists
docker manifest inspect <full-image-reference>
 
# 3. Check if imagePullSecret exists and is correct
kubectl get secrets -n <namespace>
kubectl get secret <secret-name> -n <namespace> -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d
 
# 4. Check ServiceAccount has the secret
kubectl get sa default -n <namespace> -o yaml
 
# 5. Test pull from the node itself
kubectl debug node/<node-name> -it --image=busybox -- \
  wget -q --spider https://registry.example.com/v2/
 
# 6. Check node events for broader issues
kubectl get events -n <namespace> --sort-by='.lastTimestamp'

Prevention: Stop ImagePullBackOff Before It Happens

  1. Use image digests in production — immutable references eliminate tag-related failures
  2. Pre-pull images — use a DaemonSet to pre-pull critical images to all nodes
  3. Set up a pull-through cache — Harbor or Nexus as a local mirror reduces external dependency
  4. Use admission controllers — tools like Kyverno or OPA can validate image references before pods are created
  5. Monitor registry quotas — set up alerts for Docker Hub rate limit headers
yaml
# Kyverno policy to require image digests
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-image-digest
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-digest
      match:
        any:
          - resources:
              kinds:
                - Pod
      validate:
        message: "Images must use digests, not tags"
        pattern:
          spec:
            containers:
              - image: "*@sha256:*"

Wrapping Up

ImagePullBackOff almost always comes down to one of six things: wrong image name, missing auth, rate limits, network issues, platform mismatch, or a tag that doesn't exist. The kubectl describe pod events section tells you which one.

Start your debugging there, and you'll resolve it in minutes instead of hours.


Want to master Kubernetes troubleshooting? KodeKloud's hands-on labs let you practice debugging real cluster issues in a live environment — no local setup needed.

If you're running your clusters on cloud infrastructure, DigitalOcean's managed Kubernetes gives you a production-ready cluster in minutes with built-in monitoring and alerting.

Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments