🎉 DevOps Interview Prep Bundle is live — 1000+ Q&A across 20 topicsGet it →
All Articles

AWS EKS Node Group Not Scaling Up — Fix

Your EKS pods are Pending, nodes should be scaling up but aren't. Here's every reason EKS managed node groups fail to scale and exactly how to fix each one.

DevOpsBoysJun 5, 20264 min read
Share:Tweet

Pods stuck in Pending, cluster autoscaler logs show it's trying to scale, but no new nodes appear. Or new nodes spin up but never join the cluster.

Here's every cause and fix.


Diagnose First

bash
# Check pending pods and why they're pending
kubectl get pods --all-namespaces | grep Pending
kubectl describe pod <pending-pod> | grep -A10 "Events:"
# Look for: "0/3 nodes are available: 3 Insufficient cpu"
 
# Check cluster autoscaler logs (if using CA)
kubectl logs -n kube-system deployment/cluster-autoscaler | tail -50
# Look for: "Scale-up: setting group ... size to X"
 
# Check Karpenter logs (if using Karpenter)
kubectl logs -n kube-system deployment/karpenter -c controller | tail -50
 
# Check node group status in AWS
aws eks describe-nodegroup \
  --cluster-name my-cluster \
  --nodegroup-name my-nodegroup \
  --query 'nodegroup.{Status:status,Desired:scalingConfig.desiredSize,Min:scalingConfig.minSize,Max:scalingConfig.maxSize}'

Case 1: Node Group Hit Max Size

Most common cause. Max size is too low.

bash
# Check current scaling config
aws eks describe-nodegroup \
  --cluster-name my-cluster \
  --nodegroup-name my-nodegroup \
  --query 'nodegroup.scalingConfig'
 
# Output:
# {
#   "minSize": 2,
#   "maxSize": 5,    ← Already at max
#   "desiredSize": 5
# }
 
# Fix: increase max size
aws eks update-nodegroup-config \
  --cluster-name my-cluster \
  --nodegroup-name my-nodegroup \
  --scaling-config minSize=2,maxSize=20,desiredSize=5

Case 2: IAM Role Missing Permissions

The node group IAM role needs specific permissions to join the cluster.

bash
# Check if new node exists but won't join
aws ec2 describe-instances \
  --filters "Name=tag:eks:cluster-name,Values=my-cluster" \
  --query 'Reservations[*].Instances[*].[InstanceId,State.Name,LaunchTime]'
 
# If instance exists but not in kubectl get nodes:
# Check the node's cloud-init logs
aws ssm start-session --target <instance-id>
cat /var/log/cloud-init-output.log | tail -100

Required IAM policies on node role:

json
{
  "policies": [
    "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy",
    "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy",
    "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
  ]
}
bash
# Check what policies are attached
aws iam list-attached-role-policies \
  --role-name eks-node-role \
  --query 'AttachedPolicies[*].PolicyName'

Case 3: Subnet Doesn't Have Available IPs

VPC subnets run out of IPs. New nodes can't get an IP and fail to launch.

bash
# Check subnet available IPs
aws ec2 describe-subnets \
  --filters "Name=tag:kubernetes.io/cluster/my-cluster,Values=shared" \
  --query 'Subnets[*].[SubnetId,AvailableIpAddressCount,CidrBlock]'
 
# If AvailableIpAddressCount is 0 or low:
# Option 1: Add new subnets to node group
aws eks update-nodegroup-config \
  --cluster-name my-cluster \
  --nodegroup-name my-nodegroup \
  --subnets subnet-new1 subnet-new2
 
# Option 2: Enable VPC CNI prefix delegation (10x more IPs per node)
kubectl set env daemonset aws-node -n kube-system \
  ENABLE_PREFIX_DELEGATION=true

Case 4: EC2 Capacity Issues (Spot Instances)

Spot instances can be unavailable in specific AZ/instance type combinations.

bash
# Check for capacity errors in node group activity
aws eks describe-nodegroup \
  --cluster-name my-cluster \
  --nodegroup-name my-nodegroup \
  --query 'nodegroup.health'
 
# Fix: use multiple instance types for spot
aws eks update-nodegroup-config \
  --cluster-name my-cluster \
  --nodegroup-name my-nodegroup \
  # Can't update instance types via CLI — recreate node group with multiple types
 
# Or use Karpenter which handles spot diversification automatically:
# requirements:
#   - key: karpenter.k8s.aws/instance-family
#     values: [c5, m5, r5, c5a, m5a, c6i, m6i]

Case 5: aws-auth ConfigMap Missing Node Role

New nodes exist but aren't authorized to join the cluster.

bash
# Check aws-auth configmap
kubectl describe configmap aws-auth -n kube-system
 
# Should contain your node role:
# mapRoles:
# - rolearn: arn:aws:iam::ACCOUNT:role/eks-node-role
#   username: system:node:{{EC2PrivateDNSName}}
#   groups:
#   - system:bootstrappers
#   - system:nodes
 
# If missing, add it:
kubectl edit configmap aws-auth -n kube-system
# Add the node role entry

Case 6: Cluster Autoscaler Not Recognizing Node Group

CA needs specific tags on the Auto Scaling Group.

bash
# Check if node group ASG has required tags
aws autoscaling describe-auto-scaling-groups \
  --auto-scaling-group-names <asg-name> \
  --query 'AutoScalingGroups[0].Tags[*].[Key,Value]' \
  | grep -E "k8s.io|kubernetes.io"
 
# Required tags:
# k8s.io/cluster-autoscaler/enabled = true
# k8s.io/cluster-autoscaler/<cluster-name> = owned

Case 7: Pod Has Node Selector That Doesn't Match

The pod itself might be requiring a node that doesn't exist.

bash
# Check pod's node requirements
kubectl describe pod <pending-pod> | grep -A5 "Node-Selectors:"
kubectl describe pod <pending-pod> | grep -A10 "Tolerations:"
 
# Check if any node matches
kubectl get nodes --show-labels | grep "your-required-label"

Quick Diagnosis Flow

bash
1. kubectl describe pod <pending>  what does it need?
2. kubectl get nodes how many nodes, what state?
3. AWS Console EC2 Auto Scaling Groups Activity any errors?
4. cluster-autoscaler logs is it trying to scale? Any errors?
5. New instance in EC2 but not in kubectl? IAM or aws-auth issue
6. No new instance at all? max size, capacity, or subnet issue

Learn AWS EKS architecture and troubleshooting at KodeKloud.

🔧

Today I Fixed

Short real fixes from production — posted daily

Browse fixes
Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments