🎉 DevOps Interview Prep Bundle is live — 1000+ Q&A across 20 topicsGet it →
All Fixes
Today I Fixed

AWS ECR: 401 Unauthorized When Pulling Image in Kubernetes

kubectlMay 28, 202620 minutes to fixawskubernetestroubleshootingeks

The Problem

EKS pods were stuck in ImagePullBackOff. Running kubectl describe pod:

Failed to pull image "123456789.dkr.ecr.us-east-1.amazonaws.com/my-app:latest": 
rpc error: code = Unknown 
desc = failed to pull and unpack image: 
failed to resolve reference "...": 
unexpected status code 401 Unauthorized

What Happened

The ECR authentication token expired. ECR auth tokens are valid for only 12 hours. The nodes had authenticated at startup, but after 12 hours the token expired and new image pulls started failing.

The Fix

Option 1: Restart the node group (forces fresh token)

Not ideal, but works in an emergency:

bash
# Force nodes to re-authenticate
aws ec2 reboot-instances --instance-ids i-xxxxx

Option 2: The proper fix — use ECR credential helper

On EKS, the nodes should automatically refresh ECR credentials if the node IAM role has the right permissions.

bash
# Check node IAM role
aws eks describe-nodegroup \
  --cluster-name my-cluster \
  --nodegroup-name my-nodes \
  --query 'nodegroup.nodeRole'
 
# The role needs this policy attached:
# AmazonEC2ContainerRegistryReadOnly
bash
# Attach the policy if missing
aws iam attach-role-policy \
  --role-name eks-node-role \
  --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly

Option 3: For non-EKS clusters — create a refresh CronJob

bash
# Create/refresh the ECR secret
aws ecr get-login-password --region us-east-1 | \
kubectl create secret docker-registry ecr-secret \
  --docker-server=123456789.dkr.ecr.us-east-1.amazonaws.com \
  --docker-username=AWS \
  --docker-password=$(aws ecr get-login-password --region us-east-1) \
  --dry-run=client -o yaml | kubectl apply -f -

Set this as a CronJob that runs every 6 hours to keep the token fresh.

Root Cause

EKS nodes with the correct IAM role automatically handle ECR auth refresh. The issue was that the node group had been created with a custom IAM role that was missing AmazonEC2ContainerRegistryReadOnly. Attached the policy, new nodes picked it up, problem solved permanently.