AWS EKS Worker Nodes Not Joining the Cluster: Complete Fix Guide
EKS worker nodes stuck in NotReady or not appearing at all? Here are all the causes and step-by-step fixes for node bootstrap failures.
You create an EKS node group. The EC2 instances launch. But kubectl get nodes shows nothing — or the nodes show up as NotReady and stay that way.
This is one of the most frustrating EKS issues because the failure happens on the node, not in the control plane, and the errors can come from IAM, networking, bootstrap scripts, or AMI issues.
Here's every cause and how to fix it.
Quick Diagnosis First
# Check if nodes appear at all
kubectl get nodes
# If nodes show but are NotReady
kubectl describe node <node-name>
# Look for: Conditions → Ready: False, and Events section
# Check node group status in AWS console
aws eks describe-nodegroup \
--cluster-name my-cluster \
--nodegroup-name my-nodegroup \
--query 'nodegroup.status'
# Check EC2 instances
aws ec2 describe-instances \
--filters "Name=tag:eks:cluster-name,Values=my-cluster" \
--query 'Reservations[].Instances[].{ID:InstanceId,State:State.Name,Status:NetworkInterfaces[0].Status}'If nodes don't appear in kubectl get nodes at all, the issue is registration — the node can't reach the API server.
If nodes appear but are NotReady, the node joined but the kubelet is unhealthy.
Cause 1 — Missing IAM Role Permissions (Most Common)
Worker nodes need an IAM role with three specific policies to join EKS.
Diagnose
# Check node role
aws iam list-attached-role-policies \
--role-name my-eks-node-role \
--query 'AttachedPolicies[].PolicyName'Fix — Attach Required Policies
# Required policies for EKS worker nodes
aws iam attach-role-policy \
--role-name my-eks-node-role \
--policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
aws iam attach-role-policy \
--role-name my-eks-node-role \
--policy-arn arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
aws iam attach-role-policy \
--role-name my-eks-node-role \
--policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnlyAlso add the node IAM role to the aws-auth ConfigMap:
# Check aws-auth ConfigMap
kubectl get configmap aws-auth -n kube-system -o yamlIf the node role isn't in aws-auth, nodes can't authenticate with the API server:
apiVersion: v1
kind: ConfigMap
metadata:
name: aws-auth
namespace: kube-system
data:
mapRoles: |
- rolearn: arn:aws:iam::123456789012:role/my-eks-node-role
username: system:node:{{EC2PrivateDNSName}}
groups:
- system:bootstrappers
- system:nodeskubectl apply -f aws-auth.yamlCause 2 — Node in Wrong Subnet or Security Group
Nodes must be able to reach the EKS API server endpoint on port 443.
Diagnose
SSH into the failing node (or use SSM):
# Connect via SSM (no SSH key needed)
aws ssm start-session --target <instance-id>
# Test API server connectivity
curl -k https://<api-server-endpoint>:443
# Should return: 403 Forbidden (not a timeout)If you get a timeout, the security group or route table is blocking the connection.
Fix — Security Group Rules
The node security group needs:
Outbound:
Port 443 (HTTPS) → API server security group or CIDR
Port 10250 (kubelet) → within cluster
All traffic → 0.0.0.0/0 (for NAT/internet access)
Inbound:
Port 443 → from control plane
Port 10250 → from control plane
All traffic → within VPC CIDR
In Terraform:
resource "aws_security_group_rule" "node_to_control_plane" {
type = "egress"
from_port = 443
to_port = 443
protocol = "tcp"
source_security_group_id = aws_security_group.eks_control_plane.id
security_group_id = aws_security_group.eks_nodes.id
}Cause 3 — Private Nodes Can't Reach API Server
If your nodes are in private subnets and the EKS endpoint is public-only:
# Check endpoint access config
aws eks describe-cluster --name my-cluster \
--query 'cluster.resourcesVpcConfig.{public:endpointPublicAccess,private:endpointPrivateAccess}'
# {"public": true, "private": false}Nodes in private subnets need either:
- NAT Gateway to reach the public endpoint
- Private endpoint enabled
Fix Option 1 — Enable Private Endpoint
aws eks update-cluster-config \
--name my-cluster \
--resources-vpc-config endpointPrivateAccess=true,endpointPublicAccess=trueFix Option 2 — Add NAT Gateway for Private Nodes
If your private subnets don't have a NAT Gateway route:
resource "aws_nat_gateway" "nat" {
allocation_id = aws_eip.nat.id
subnet_id = aws_subnet.public.id
}
resource "aws_route" "private_nat_route" {
route_table_id = aws_route_table.private.id
destination_cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.nat.id
}Cause 4 — Kubelet Bootstrap Failure
The node joins but kubelet fails to start. Check the bootstrap log:
# On the node (via SSM)
sudo journalctl -u kubelet --since "10 minutes ago"
sudo cat /var/log/cloud-init-output.log
sudo cat /var/log/user-data.log # if using custom launch templateCommon kubelet errors:
"Failed to get node info: nodes not found"
The node can reach the API server but can't register. Usually an IAM aws-auth issue. See Cause 1.
"certificate signed by unknown authority"
x509: certificate signed by unknown authority
The node can't verify the API server certificate. Usually happens with:
- Wrong cluster endpoint in bootstrap script
- Self-signed certs in private clusters
Fix — check the --apiserver-endpoint and --b64-cluster-ca in your bootstrap or launch template user data.
"failed to reserve container name"
The Docker or containerd socket isn't ready. Usually means the AMI is wrong or the node rebooted before setup finished.
Cause 5 — Wrong AMI for EKS Version
Each EKS cluster version requires a matching EKS-optimized AMI. Using an AMI built for EKS 1.27 on a 1.30 cluster breaks bootstrap.
Fix — Use the Correct AMI
# Get the correct AMI for your EKS version and region
aws ssm get-parameter \
--name /aws/service/eks/optimized-ami/1.30/amazon-linux-2/recommended/image_id \
--region ap-south-1 \
--query Parameter.Value \
--output text
# ami-0abc123def456In eksctl cluster.yaml:
nodeGroups:
- name: workers
instanceType: t3.medium
amiFamily: AmazonLinux2 # Let eksctl pick the right AMI automaticallyCause 6 — Max Pods Exceeded
Nodes join but refuse to schedule pods. Check:
kubectl describe node <node-name> | grep "max-pods"
kubectl describe node <node-name> | grep -A5 "Allocatable"AWS VPC CNI limits pods per node based on instance type (secondary IPs per ENI). A t3.medium supports only 17 pods.
Fix — Use Prefix Delegation or Larger Instances
Enable prefix delegation (more IPs per ENI) for higher pod density:
kubectl set env daemonset aws-node \
-n kube-system \
ENABLE_PREFIX_DELEGATION=trueOr use m5.xlarge which supports up to 58 pods.
Cause 7 — Clock Skew on Nodes
If node system clock is significantly off from the API server, certificate validation fails:
x509: certificate has expired or is not yet valid
Fix
The EKS-optimized AMI syncs time via chrony. If you're using a custom AMI:
# On the node
sudo timedatectl status
sudo chronyc sources
sudo chronyc tracking
# Force sync
sudo chronyc makestepStep-by-Step Debugging Checklist
# 1. Does the node appear?
kubectl get nodes
# 2. Check node group status
aws eks describe-nodegroup --cluster-name <name> --nodegroup-name <name>
# 3. Check EC2 instance state
aws ec2 describe-instance-status --instance-ids <id>
# 4. SSH/SSM into node and check kubelet
sudo systemctl status kubelet
sudo journalctl -u kubelet -n 100
# 5. Check API server connectivity from node
curl -k https://<api-endpoint>:443
# 6. Verify IAM role policies
aws iam list-attached-role-policies --role-name <node-role>
# 7. Check aws-auth ConfigMap
kubectl get configmap aws-auth -n kube-system -o yaml
# 8. Check security groups allow port 443 outbound
aws ec2 describe-security-groups --group-ids <sg-id>Summary
| Cause | Symptom | Fix |
|---|---|---|
| Missing IAM policies | Nodes don't appear | Attach 3 required policies + update aws-auth |
| Wrong security group | Nodes don't appear | Allow port 443 outbound to API server |
| Private nodes, no NAT | Nodes don't appear | Enable private endpoint or add NAT Gateway |
| Kubelet bootstrap error | Node joins but NotReady | Check /var/log/user-data.log + journalctl |
| Wrong AMI | Bootstrap failure | Use SSM to find correct EKS-optimized AMI |
| Max pods exceeded | Pods won't schedule | Enable prefix delegation or use larger instance |
| Clock skew | Certificate errors | Sync time with chrony |
IAM and networking (security groups + VPC routing) cause 90% of node join failures. Start there.
Build and test your EKS setup on DigitalOcean Kubernetes with $200 free credit, or practice EKS troubleshooting on KodeKloud hands-on labs.
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
AWS EKS Pods Stuck in Pending State: Causes and Fixes
Pods stuck in Pending on EKS are caused by a handful of known issues — insufficient node capacity, taint mismatches, PVC problems, and more. Here's how to diagnose and fix each one.
EKS Fargate Pod Not Scheduling — Causes and Fixes (2026)
Pods stuck in Pending on EKS Fargate? Here are the 8 most common reasons Fargate pods won't schedule and exactly how to fix each one.
AWS EKS vs Google GKE vs Azure AKS — Which Managed Kubernetes to Use in 2026?
Honest comparison of EKS, GKE, and AKS in 2026: pricing, developer experience, networking, autoscaling, and which one to pick for your use case.