AWS EKS Worker Nodes Not Joining the Cluster: Complete Fix Guide

EKS worker nodes stuck in NotReady or not appearing at all? Here are all the causes and step-by-step fixes for node bootstrap failures.

You create an EKS node group. The EC2 instances launch. But kubectl get nodes shows nothing — or the nodes show up as NotReady and stay that way.

This is one of the most frustrating EKS issues because the failure happens on the node, not in the control plane, and the errors can come from IAM, networking, bootstrap scripts, or AMI issues.

Here's every cause and how to fix it.

Quick Diagnosis First

bash

# Check if nodes appear at all
kubectl get nodes
 
# If nodes show but are NotReady
kubectl describe node <node-name>
# Look for: Conditions → Ready: False, and Events section
 
# Check node group status in AWS console
aws eks describe-nodegroup \
  --cluster-name my-cluster \
  --nodegroup-name my-nodegroup \
  --query 'nodegroup.status'
 
# Check EC2 instances
aws ec2 describe-instances \
  --filters "Name=tag:eks:cluster-name,Values=my-cluster" \
  --query 'Reservations[].Instances[].{ID:InstanceId,State:State.Name,Status:NetworkInterfaces[0].Status}'

If nodes don't appear in kubectl get nodes at all, the issue is registration — the node can't reach the API server.

If nodes appear but are NotReady, the node joined but the kubelet is unhealthy.

Cause 1 — Missing IAM Role Permissions (Most Common)

Worker nodes need an IAM role with three specific policies to join EKS.

Diagnose

bash

# Check node role
aws iam list-attached-role-policies \
  --role-name my-eks-node-role \
  --query 'AttachedPolicies[].PolicyName'

Fix — Attach Required Policies

bash

# Required policies for EKS worker nodes
aws iam attach-role-policy \
  --role-name my-eks-node-role \
  --policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
 
aws iam attach-role-policy \
  --role-name my-eks-node-role \
  --policy-arn arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
 
aws iam attach-role-policy \
  --role-name my-eks-node-role \
  --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly

Also add the node IAM role to the aws-auth ConfigMap:

bash

# Check aws-auth ConfigMap
kubectl get configmap aws-auth -n kube-system -o yaml

If the node role isn't in aws-auth, nodes can't authenticate with the API server:

yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapRoles: |
    - rolearn: arn:aws:iam::123456789012:role/my-eks-node-role
      username: system:node:{{EC2PrivateDNSName}}
      groups:
        - system:bootstrappers
        - system:nodes

bash

kubectl apply -f aws-auth.yaml

Cause 2 — Node in Wrong Subnet or Security Group

Nodes must be able to reach the EKS API server endpoint on port 443.

Diagnose

SSH into the failing node (or use SSM):

bash

# Connect via SSM (no SSH key needed)
aws ssm start-session --target <instance-id>
 
# Test API server connectivity
curl -k https://<api-server-endpoint>:443
# Should return: 403 Forbidden (not a timeout)

If you get a timeout, the security group or route table is blocking the connection.

Fix — Security Group Rules

The node security group needs:

Outbound:
  Port 443 (HTTPS) → API server security group or CIDR
  Port 10250 (kubelet) → within cluster
  All traffic → 0.0.0.0/0 (for NAT/internet access)

Inbound:
  Port 443 → from control plane
  Port 10250 → from control plane
  All traffic → within VPC CIDR

In Terraform:

hcl

resource "aws_security_group_rule" "node_to_control_plane" {
  type                     = "egress"
  from_port                = 443
  to_port                  = 443
  protocol                 = "tcp"
  source_security_group_id = aws_security_group.eks_control_plane.id
  security_group_id        = aws_security_group.eks_nodes.id
}

Cause 3 — Private Nodes Can't Reach API Server

If your nodes are in private subnets and the EKS endpoint is public-only:

bash

# Check endpoint access config
aws eks describe-cluster --name my-cluster \
  --query 'cluster.resourcesVpcConfig.{public:endpointPublicAccess,private:endpointPrivateAccess}'
# {"public": true, "private": false}

Nodes in private subnets need either:

NAT Gateway to reach the public endpoint
Private endpoint enabled

Fix Option 1 — Enable Private Endpoint

bash

aws eks update-cluster-config \
  --name my-cluster \
  --resources-vpc-config endpointPrivateAccess=true,endpointPublicAccess=true

Fix Option 2 — Add NAT Gateway for Private Nodes

If your private subnets don't have a NAT Gateway route:

hcl

resource "aws_nat_gateway" "nat" {
  allocation_id = aws_eip.nat.id
  subnet_id     = aws_subnet.public.id
}
 
resource "aws_route" "private_nat_route" {
  route_table_id         = aws_route_table.private.id
  destination_cidr_block = "0.0.0.0/0"
  nat_gateway_id         = aws_nat_gateway.nat.id
}

Cause 4 — Kubelet Bootstrap Failure

The node joins but kubelet fails to start. Check the bootstrap log:

bash

# On the node (via SSM)
sudo journalctl -u kubelet --since "10 minutes ago"
sudo cat /var/log/cloud-init-output.log
sudo cat /var/log/user-data.log    # if using custom launch template

Common kubelet errors:

"Failed to get node info: nodes not found"

The node can reach the API server but can't register. Usually an IAM aws-auth issue. See Cause 1.

"certificate signed by unknown authority"

x509: certificate signed by unknown authority

The node can't verify the API server certificate. Usually happens with:

Wrong cluster endpoint in bootstrap script
Self-signed certs in private clusters

Fix — check the --apiserver-endpoint and --b64-cluster-ca in your bootstrap or launch template user data.

# Get the correct AMI for your EKS version and region
aws ssm get-parameter \
  --name /aws/service/eks/optimized-ami/1.30/amazon-linux-2/recommended/image_id \
  --region ap-south-1 \
  --query Parameter.Value \
  --output text
# ami-0abc123def456

In eksctl cluster.yaml:

yaml

nodeGroups:
  - name: workers
    instanceType: t3.medium
    amiFamily: AmazonLinux2    # Let eksctl pick the right AMI automatically

Cause 6 — Max Pods Exceeded

Nodes join but refuse to schedule pods. Check:

bash

kubectl describe node <node-name> | grep "max-pods"
kubectl describe node <node-name> | grep -A5 "Allocatable"

AWS VPC CNI limits pods per node based on instance type (secondary IPs per ENI). A t3.medium supports only 17 pods.

Fix — Use Prefix Delegation or Larger Instances

Enable prefix delegation (more IPs per ENI) for higher pod density:

bash

kubectl set env daemonset aws-node \
  -n kube-system \
  ENABLE_PREFIX_DELEGATION=true

Or use m5.xlarge which supports up to 58 pods.

Cause 7 — Clock Skew on Nodes

If node system clock is significantly off from the API server, certificate validation fails:

x509: certificate has expired or is not yet valid

Fix

The EKS-optimized AMI syncs time via chrony. If you're using a custom AMI:

bash

# On the node
sudo timedatectl status
sudo chronyc sources
sudo chronyc tracking
 
# Force sync
sudo chronyc makestep

Step-by-Step Debugging Checklist

bash

# 1. Does the node appear?
kubectl get nodes
 
# 2. Check node group status
aws eks describe-nodegroup --cluster-name <name> --nodegroup-name <name>
 
# 3. Check EC2 instance state
aws ec2 describe-instance-status --instance-ids <id>
 
# 4. SSH/SSM into node and check kubelet
sudo systemctl status kubelet
sudo journalctl -u kubelet -n 100
 
# 5. Check API server connectivity from node
curl -k https://<api-endpoint>:443
 
# 6. Verify IAM role policies
aws iam list-attached-role-policies --role-name <node-role>
 
# 7. Check aws-auth ConfigMap
kubectl get configmap aws-auth -n kube-system -o yaml
 
# 8. Check security groups allow port 443 outbound
aws ec2 describe-security-groups --group-ids <sg-id>

Summary

Cause	Symptom	Fix
Missing IAM policies	Nodes don't appear	Attach 3 required policies + update aws-auth
Wrong security group	Nodes don't appear	Allow port 443 outbound to API server
Private nodes, no NAT	Nodes don't appear	Enable private endpoint or add NAT Gateway
Kubelet bootstrap error	Node joins but NotReady	Check `/var/log/user-data.log` + journalctl
Wrong AMI	Bootstrap failure	Use SSM to find correct EKS-optimized AMI
Max pods exceeded	Pods won't schedule	Enable prefix delegation or use larger instance
Clock skew	Certificate errors	Sync time with chrony

IAM and networking (security groups + VPC routing) cause 90% of node join failures. Start there.

Build and test your EKS setup on DigitalOcean Kubernetes with $200 free credit, or practice EKS troubleshooting on KodeKloud hands-on labs.

AWS EKS Worker Nodes Not Joining the Cluster: Complete Fix Guide

Quick Diagnosis First

Cause 1 — Missing IAM Role Permissions (Most Common)

Diagnose

Fix — Attach Required Policies

Cause 2 — Node in Wrong Subnet or Security Group

Diagnose

Fix — Security Group Rules

Cause 3 — Private Nodes Can't Reach API Server

Fix Option 1 — Enable Private Endpoint

Fix Option 2 — Add NAT Gateway for Private Nodes

Cause 4 — Kubelet Bootstrap Failure

"Failed to get node info: nodes not found"

"certificate signed by unknown authority"

"failed to reserve container name"

Cause 5 — Wrong AMI for EKS Version

Fix — Use the Correct AMI

Cause 6 — Max Pods Exceeded

Fix — Use Prefix Delegation or Larger Instances

Cause 7 — Clock Skew on Nodes

Fix

Step-by-Step Debugging Checklist

Summary

Stay ahead of the curve

Related Articles

AWS EKS Cluster Autoscaler Not Scaling — Every Fix (2026)

AWS EKS Pods Stuck in Pending State: Causes and Fixes

EKS Fargate Pod Not Scheduling — Causes and Fixes (2026)

Comments