kubectl drain Stuck Forever? Here's the Exact Fix (PDB + Non-Evictable Pods)
kubectl drain hanging with no output? PodDisruptionBudget or DaemonSet pods are blocking it. Here's how to diagnose and fix it without nuking your cluster.
You run kubectl drain node-01 --ignore-daemonsets before a maintenance window. It starts, prints one or two pod names, then just... sits there. No progress. No error. Just a blinking cursor mocking you.
Twenty minutes later your maintenance window is half gone and you're sweating.
I've been here more times than I can count. Here's exactly what's happening and how to fix it.
Why kubectl drain Hangs
kubectl drain does two things: it cordons the node (marks it unschedulable) and then evicts every pod on it. The eviction is the part that hangs.
Kubernetes won't evict a pod if doing so would violate a PodDisruptionBudget (PDB). This is a feature, not a bug — PDBs exist to protect your application's availability during disruptions. The problem is when PDBs are misconfigured, or when the budget can't be satisfied because of how your pods are currently distributed.
Other common blockers:
- DaemonSet pods (requires
--ignore-daemonsets) - Pods with
emptyDirvolumes (requires--delete-emptydir-data) - Pods in
Terminatingstate that won't die - Pods owned by nothing (no controller) — these are never automatically rescheduled
Step 1: See What's Actually Blocking
kubectl drain node-01 --ignore-daemonsets --dry-run=clientThis tells you what would be evicted without doing anything. Look for:
error: cannot delete Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet
Or:
error: Cannot evict pod as it would violate the pod's disruption budget.
These are your two most common culprits.
Step 2: Find the PDB That's Blocking
kubectl get pdb -ALook at the ALLOWED DISRUPTIONS column:
NAMESPACE NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS
production api-pdb 3 N/A 0
staging worker-pdb 1 N/A 1
ALLOWED DISRUPTIONS: 0 means Kubernetes cannot evict a single pod from that group without violating the budget. This happens when:
- You have
minAvailable: 3but only 3 replicas running — any eviction would drop below minimum - You have
maxUnavailable: 0(which means zero pods allowed to be down — why would you do this?) - A pod is already in a non-ready state, consuming the disruption budget
Fix: Check Current Pod Count vs PDB Min
# Check how many replicas are actually running
kubectl get deployment -n production api-deployment
# Compare against the PDB
kubectl describe pdb api-pdb -n productionIf minAvailable equals your current replica count, scale up first:
kubectl scale deployment api-deployment -n production --replicas=4Wait for the new pod to be Ready, then drain again.
Step 3: Find Non-Evictable Pods (No Controller)
kubectl get pods --field-selector=spec.nodeName=node-01 -A -o wideThen check each pod's owner:
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.metadata.ownerReferences}'If this returns empty [] — that pod has no controller. Kubernetes won't reschedule it anywhere. You need to decide: delete it manually, or understand why it exists without a controller (probably a one-off debugging pod someone forgot about).
kubectl delete pod <orphan-pod-name> -n <namespace>Step 4: Stuck Terminating Pods
Sometimes pods get stuck in Terminating forever. This usually means the pod's finalizer isn't releasing.
kubectl get pods -n production | grep TerminatingForce delete (use with caution — only after confirming the pod process is actually dead):
kubectl delete pod <pod-name> -n production --grace-period=0 --forceIf that still doesn't work, the pod has a finalizer that's preventing deletion:
kubectl patch pod <pod-name> -n production -p '{"metadata":{"finalizers":[]}}' --type=mergeStep 5: The Nuclear Option (Don't Do This In Production Without Understanding Why)
If you've exhausted everything and truly need to proceed:
kubectl drain node-01 \
--ignore-daemonsets \
--delete-emptydir-data \
--disable-eviction \
--force \
--grace-period=30--disable-eviction bypasses the PDB check entirely and directly deletes pods. --force handles pods with no controller. This will violate your PDBs — meaning your application may have reduced availability during the drain. Know what you're doing before using it.
Prevention: Write PDBs That Don't Block Drains
The ideal PDB allows at least one disruption at all times:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-pdb
namespace: production
spec:
maxUnavailable: 1 # At most 1 pod can be down
selector:
matchLabels:
app: apiUsing maxUnavailable: 1 instead of minAvailable: N is almost always the better choice. It scales with your replica count and always allows at least one eviction.
If you're running a 2-replica deployment, minAvailable: 2 will permanently block all drains. Don't do it.
Quick Diagnosis Checklist
When kubectl drain hangs:
kubectl get pdb -A— check ALLOWED DISRUPTIONS columnkubectl get pods --field-selector=spec.nodeName=<node> -A— look for pods in weird stateskubectl describe node <node>— check conditions and eventskubectl get pods -A | grep Terminating— find stuck pods- Scale up deployments if PDB minAvailable equals current replicas
Most drain hangs are solved by step 1 alone. The PDB is almost always the culprit.
Draining nodes is routine until it isn't. Understanding PDBs properly is what separates engineers who panic during maintenance windows from those who fix it in 5 minutes and still make it to standup on time.
Recommended reading: Kubernetes Resource Calculator — size your replicas correctly before setting PDB minimums.
Today I Fixed
Short real fixes from production — posted daily
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
ArgoCD App of Apps Not Syncing — Every Fix (2026)
Your ArgoCD App of Apps pattern stopped syncing. Child apps aren't created, parent shows OutOfSync, or sync is stuck. Here are every cause and the exact fix.
ArgoCD Image Updater Not Syncing — Fix Guide
ArgoCD Image Updater detects a new image tag but doesn't update the Application. Here's how to diagnose and fix annotation errors, registry auth issues, write-back problems, and sync failures.
AWS EKS Cluster Autoscaler Not Scaling — Every Fix (2026)
Your EKS Cluster Autoscaler isn't scaling up, scale-down isn't working, or nodes spin up but stay empty. Here's every cause and the exact fix.