AWS EKS Cluster Autoscaler Not Scaling — Every Fix (2026)
Your EKS Cluster Autoscaler isn't scaling up, scale-down isn't working, or nodes spin up but stay empty. Here's every cause and the exact fix.
Cluster Autoscaler on EKS feels like it should just work — but misconfigured IAM, wrong ASG tags, or pending pod annotations can silently prevent scaling for hours. Here's every cause and the exact fix.
How EKS Cluster Autoscaler Works
The Cluster Autoscaler (CA) watches for:
- Pending pods — pods that can't be scheduled because no node has enough resources
- Underutilized nodes — nodes where all pods could fit on fewer nodes
It communicates with AWS Auto Scaling Groups (ASGs) to add or remove nodes.
The full flow for scale-up:
Pod pending → CA detects → selects ASG → increases desired count → ASG launches EC2 → kubelet joins → pod scheduled
Symptom 1: Pods Pending but No Scale-Up
Check 1: Is CA Running?
kubectl get pods -n kube-system | grep cluster-autoscaler
# Check CA logs
kubectl logs -n kube-system \
-l app.kubernetes.io/name=cluster-autoscaler \
--tail=100Look for lines like:
I0526 No candidates for scale up
W0526 Failed to get node group size
Check 2: Pod Actually Pending?
CA only scales up for pods that are Pending due to resource constraints. Not for pods that are Pending for other reasons.
# Check pod status
kubectl describe pod <pending-pod>
# Look for Events section
Events:
Warning FailedScheduling 0/3 nodes available:
3 Insufficient cpu. ← CA will scale up for this
Warning FailedScheduling pod has unbound PVC
← CA will NOT scale up for this (storage issue, not resources)CA scales for: insufficient CPU/memory, node selector not satisfied, no node with GPU. CA does NOT scale for: PVC binding failures, image pull errors, RBAC issues.
Check 3: ASG Tags Are Missing or Wrong
CA discovers ASGs by looking for specific AWS tags. If your node group's ASG doesn't have these tags, CA ignores it.
Required ASG tags:
k8s.io/cluster-autoscaler/<cluster-name> = owned
k8s.io/cluster-autoscaler/enabled = true
Check your ASG:
# Get your node group ASG name
aws eks describe-nodegroup \
--cluster-name my-cluster \
--nodegroup-name my-nodegroup \
--query "nodegroup.resources.autoScalingGroups"
# Check tags on the ASG
aws autoscaling describe-auto-scaling-groups \
--auto-scaling-group-names my-cluster-nodegroup-xxx \
--query "AutoScalingGroups[].Tags"Fix — add tags if missing:
aws autoscaling create-or-update-tags \
--tags \
ResourceId=my-asg-name,ResourceType=auto-scaling-group,Key=k8s.io/cluster-autoscaler/my-cluster,Value=owned,PropagateAtLaunch=false \
ResourceId=my-asg-name,ResourceType=auto-scaling-group,Key=k8s.io/cluster-autoscaler/enabled,Value=true,PropagateAtLaunch=falseCheck 4: IAM Permissions Missing
CA needs permissions to describe and update ASGs.
# Check CA's service account has IAM role
kubectl describe serviceaccount cluster-autoscaler -n kube-system
# Look for:
# Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::...If the annotation is missing, CA is running without IAM permissions.
Required IAM policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:DescribeScalingActivities",
"autoscaling:DescribeTags",
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup",
"ec2:DescribeImages",
"ec2:DescribeInstanceTypes",
"ec2:DescribeLaunchTemplateVersions",
"ec2:GetInstanceTypesFromInstanceRequirements",
"eks:DescribeNodegroup"
],
"Resource": "*"
}
]
}Fix — attach policy and annotate service account:
# Create IAM role with IRSA
eksctl create iamserviceaccount \
--cluster my-cluster \
--namespace kube-system \
--name cluster-autoscaler \
--attach-policy-arn arn:aws:iam::123456789:policy/ClusterAutoscalerPolicy \
--override-existing-serviceaccounts \
--approveCheck 5: CA Version Doesn't Match Cluster Version
CA version must match your Kubernetes minor version.
# Check k8s version
kubectl version --short
# Check CA version
kubectl describe deployment cluster-autoscaler -n kube-system | grep Image
# CA 1.29 for k8s 1.29, CA 1.30 for k8s 1.30, etc.Fix — update CA deployment:
# Replace with correct version
kubectl set image deployment/cluster-autoscaler \
cluster-autoscaler=registry.k8s.io/autoscaling/cluster-autoscaler:v1.30.3 \
-n kube-systemFind correct versions at: https://github.com/kubernetes/autoscaler/releases
Symptom 2: CA Scales Up but Nodes Stay NotReady or Empty
Cause: Node Labels Don't Match Pod nodeSelector
# Check what labels new nodes get
kubectl get node -l eks.amazonaws.com/nodegroup=my-nodegroup --show-labels
# Check what your pod requires
kubectl describe pod <pending-pod> | grep -A5 "Node-Selectors"If pod requires node-type=gpu but nodes launch with node-type=standard, the pod won't schedule even after scale-up.
Fix — add label to launch template:
# In your EKS node group launch template user data:
--kubelet-extra-args '--node-labels=node-type=gpu'Cause: Taints Not Matching Tolerations
# Check node taints
kubectl describe node <new-node> | grep Taints
# Check pod tolerations
kubectl describe pod <pending-pod> | grep -A5 TolerationsIf the node has a taint the pod doesn't tolerate, pod won't schedule.
Symptom 3: Scale-Down Not Working
Check: Pod Disruption Budgets (PDB)
# Check PDBs
kubectl get pdb -A
# Check if any PDB is blocking drain
kubectl describe pdb my-pdbA PDB that requires minAvailable: 1 with only 1 replica will block scale-down permanently. Make sure your PDBs allow at least 1 pod to be disrupted.
Check: Annotations Preventing Eviction
Some pods have annotations that tell CA not to evict them:
metadata:
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "false"# Find pods with safe-to-evict=false
kubectl get pods -A -o json | \
python3 -c "
import sys, json
data = json.load(sys.stdin)
for item in data['items']:
ann = item['metadata'].get('annotations', {})
if ann.get('cluster-autoscaler.kubernetes.io/safe-to-evict') == 'false':
print(item['metadata']['namespace'], item['metadata']['name'])
"These pods keep nodes alive. Remove the annotation if eviction is safe.
Check: Scale-Down Delay
CA has a default cooldown before scale-down:
scale-down-delay-after-add: 10 minutes after a node was addedscale-down-unneeded-time: 10 minutes a node must be unneeded before removalscale-down-utilization-threshold: 0.5 (node must be below 50% utilized)
This is by design. Wait 15 minutes after a node becomes idle.
Quick Diagnostics Script
#!/bin/bash
echo "=== CA Pod Status ==="
kubectl get pods -n kube-system -l app.kubernetes.io/name=cluster-autoscaler
echo ""
echo "=== Pending Pods ==="
kubectl get pods -A --field-selector=status.phase=Pending
echo ""
echo "=== Node Capacity ==="
kubectl describe nodes | grep -A4 "Allocated resources"
echo ""
echo "=== CA Recent Logs ==="
kubectl logs -n kube-system \
-l app.kubernetes.io/name=cluster-autoscaler \
--tail=30 | grep -E "scale|Scale|ERROR|WARN"Consider Karpenter Instead
If you're fighting CA consistently, consider migrating to Karpenter — it's faster, supports spot instances better, and doesn't require pre-defined ASG node groups.
Karpenter scales in ~45 seconds vs CA's 2-3 minutes, and can provision any EC2 instance type that satisfies your pod requirements.
See: Karpenter vs Cluster Autoscaler
Related: Karpenter vs Cluster Autoscaler | AWS EKS Pods Stuck Pending Fix | Kubernetes OOMKilled Fix
Today I Fixed
Short real fixes from production — posted daily
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
AWS EKS Pods Stuck in Pending State: Causes and Fixes
Pods stuck in Pending on EKS are caused by a handful of known issues — insufficient node capacity, taint mismatches, PVC problems, and more. Here's how to diagnose and fix each one.
AWS EKS Worker Nodes Not Joining the Cluster: Complete Fix Guide
EKS worker nodes stuck in NotReady or not appearing at all? Here are all the causes and step-by-step fixes for node bootstrap failures.
EKS Fargate Pod Not Scheduling — Causes and Fixes (2026)
Pods stuck in Pending on EKS Fargate? Here are the 8 most common reasons Fargate pods won't schedule and exactly how to fix each one.