Kubernetes PDB Blocking Node Drain — How to Fix It Without Breaking Availability
Fix PodDisruptionBudget misconfigurations that block kubectl drain during cluster upgrades, node maintenance, and autoscaler operations. Real scenarios and step-by-step solutions.
You run kubectl drain node-3 --ignore-daemonsets --delete-emptydir-data and it just hangs. No errors, no progress. You wait five minutes, ten minutes — nothing. You check the output and see:
evicting pod default/my-app-7b4f6d8c9-x2k4m
error when evicting pods/"my-app-7b4f6d8c9-x2k4m" -n "default" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
Congratulations — you have a PodDisruptionBudget (PDB) blocking your drain. This is one of the most common and frustrating issues during Kubernetes cluster upgrades, node maintenance windows, and Cluster Autoscaler scale-down events. Let's fix it.
What Is a PodDisruptionBudget?
A PDB tells Kubernetes how many pods of a given workload must remain available (or how many can be unavailable) during voluntary disruptions. Voluntary disruptions include:
kubectl drain(node maintenance)- Cluster Autoscaler removing underutilized nodes
- Cluster upgrades (EKS, GKE, AKS managed node group rolling updates)
PDBs do not protect against involuntary disruptions like node crashes, OOM kills, or hardware failures.
Here's a basic PDB:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-app-pdb
namespace: default
spec:
minAvailable: 2
selector:
matchLabels:
app: my-appThis says: "At least 2 pods matching app: my-app must be running at all times during voluntary disruptions."
Sounds reasonable. Until it isn't.
The Three Scenarios That Block Everything
Scenario 1: minAvailable Equals Replica Count
This is the most common mistake. You have a Deployment with 3 replicas, and someone creates a PDB with minAvailable: 3.
# Deployment
spec:
replicas: 3
# PDB
spec:
minAvailable: 3The math is simple: Kubernetes cannot evict any pod because evicting one would drop the available count to 2, violating the PDB's requirement of 3. The drain will hang forever.
How to spot it:
kubectl get pdb -AOutput:
NAMESPACE NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
default my-app-pdb 3 N/A 0 30d
See that ALLOWED DISRUPTIONS: 0? That's your red flag. Zero allowed disruptions means nothing can be evicted.
The fix:
Option A — Lower minAvailable:
spec:
minAvailable: 2 # For 3 replicas, allow 1 to be evictedOption B — Use maxUnavailable instead:
spec:
maxUnavailable: 1 # Allow 1 pod to be down at a timeOption C — Use percentages:
spec:
minAvailable: "66%" # For 3 replicas, keeps 2 runningScenario 2: Single-Replica Deployment with PDB
You have a single-replica Deployment and a PDB with minAvailable: 1. Again, the math doesn't work — you can't evict the only pod and still keep one available.
# Deployment
spec:
replicas: 1
# PDB
spec:
minAvailable: 1
selector:
matchLabels:
app: my-singleton-appkubectl get pdb my-singleton-pdbNAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
my-singleton-pdb 1 N/A 0 15d
Zero allowed disruptions again.
The fix:
For single-replica workloads, you have a few options:
Option A — Remove the PDB entirely. If you only have one replica, a PDB doesn't add real value:
kubectl delete pdb my-singleton-pdbOption B — Scale up to at least 2 replicas if the workload supports it:
spec:
replicas: 2Option C — Use maxUnavailable: 1 (this actually works for single replicas):
spec:
maxUnavailable: 1With maxUnavailable: 1, Kubernetes is told "1 pod can be unavailable," so it can evict the single pod during drain.
Scenario 3: EKS/GKE Managed Node Group Upgrade Stuck
This is where it gets really painful. You trigger an EKS managed node group update, and it gets stuck at "Updating" for hours. Behind the scenes, EKS is trying to drain old nodes to move pods to new ones, but PDBs are blocking the drain.
The kicker: with EKS managed node groups, you don't see the drain output directly. You have to dig.
How to diagnose:
# Check node group update status
aws eks describe-update --name my-cluster --update-id <update-id>
# Check for nodes being drained
kubectl get nodes
# Look for nodes with SchedulingDisabled status
# Check PDBs across all namespaces
kubectl get pdb -A -o wide
# Look for pods stuck on the draining node
kubectl get pods -A --field-selector spec.nodeName=<draining-node-name>The fix:
First, identify which PDB is blocking:
kubectl get pdb -A | grep "0" | grep -v "ALLOWED"This filters for PDBs with 0 allowed disruptions. Fix those PDBs using the approaches above, and the upgrade will proceed.
The Emergency Fix (When You Need to Drain Now)
Sometimes you're in an incident. A node is unhealthy, you need it drained now, and you can't wait for the PDB fix to propagate.
Option 1: Temporarily delete the PDB
# Save the PDB first
kubectl get pdb my-app-pdb -o yaml > pdb-backup.yaml
# Delete it
kubectl delete pdb my-app-pdb
# Drain the node
kubectl drain node-3 --ignore-daemonsets --delete-emptydir-data
# Recreate the PDB
kubectl apply -f pdb-backup.yamlOption 2: Use --disable-eviction flag (Kubernetes 1.28+)
kubectl drain node-3 --ignore-daemonsets --delete-emptydir-data --disable-evictionThis bypasses the Eviction API entirely and deletes pods directly. The PDB is not consulted. Use this carefully — it provides zero availability protection.
Option 3: Force drain with timeout
kubectl drain node-3 --ignore-daemonsets --delete-emptydir-data --timeout=120s --forceAfter the timeout, --force will delete pods that aren't managed by a ReplicationController, ReplicaSet, DaemonSet, StatefulSet, or Job. But note: this still respects PDBs for managed pods. The --force flag only affects unmanaged pods.
Cluster Autoscaler and PDB Conflicts
The Cluster Autoscaler has its own relationship with PDBs. When the autoscaler identifies an underutilized node for removal, it checks all PDBs. If any pod on the node has a PDB that would be violated by eviction, the autoscaler skips that node.
This leads to a common complaint: "My Cluster Autoscaler never scales down."
Diagnose it:
# Check Cluster Autoscaler logs
kubectl logs -n kube-system -l app=cluster-autoscaler --tail=100 | grep "pdb"You'll see messages like:
pod default/my-app-7b4f6d8c9-x2k4m can't be evicted: would violate PDB
Fix it by auditing all PDBs:
# List all PDBs with their allowed disruptions
kubectl get pdb -A -o custom-columns=\
NAMESPACE:.metadata.namespace,\
NAME:.metadata.name,\
MIN-AVAILABLE:.spec.minAvailable,\
MAX-UNAVAILABLE:.spec.maxUnavailable,\
ALLOWED-DISRUPTIONS:.status.disruptionsAllowed,\
CURRENT-HEALTHY:.status.currentHealthy,\
DESIRED-HEALTHY:.status.desiredHealthyAny row where ALLOWED-DISRUPTIONS is 0 and the workload is healthy is a misconfiguration.
PDB Best Practices
Here's what I recommend after dealing with PDB issues across dozens of clusters:
1. Always Use maxUnavailable Instead of minAvailable
maxUnavailable: 1 is almost always the right choice. It scales naturally with your replica count and doesn't create the "equals replica count" trap.
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-app-pdb
spec:
maxUnavailable: 1
selector:
matchLabels:
app: my-app2. Never Create PDBs for Single-Replica Workloads
If you only have one replica, a PDB with minAvailable: 1 will block all drains. Either scale up or don't use a PDB.
3. Add PDB Checks to Your CI/CD Pipeline
Use a policy engine to catch misconfigurations before they hit the cluster:
# Kyverno policy to prevent PDB with minAvailable >= replicas
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: prevent-pdb-misconfiguration
spec:
validationFailureAction: Enforce
rules:
- name: check-pdb-min-available
match:
any:
- resources:
kinds:
- PodDisruptionBudget
validate:
message: "PDB minAvailable should be less than the number of replicas"
deny:
conditions:
any:
- key: "{{ request.object.spec.minAvailable }}"
operator: Equals
value: ""4. Monitor PDB Status
Set up an alert for PDBs with zero allowed disruptions:
# Prometheus alert
- alert: PDBZeroDisruptionsAllowed
expr: kube_poddisruptionbudget_status_pod_disruptions_allowed == 0
for: 30m
labels:
severity: warning
annotations:
summary: "PDB {{ $labels.poddisruptionbudget }} in {{ $labels.namespace }} has 0 allowed disruptions"
description: "This PDB will block node drains and cluster upgrades."5. Use unhealthyPodEvictionPolicy
Kubernetes 1.31+ introduced unhealthyPodEvictionPolicy. Set it to AlwaysAllow to let unhealthy pods (CrashLoopBackOff, etc.) be evicted even when PDB budget is exhausted:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-app-pdb
spec:
maxUnavailable: 1
unhealthyPodEvictionPolicy: AlwaysAllow
selector:
matchLabels:
app: my-appThis prevents the nightmare scenario where crashed pods block drains because the PDB still counts them toward minAvailable.
Quick Reference: PDB Troubleshooting Checklist
- Run
kubectl get pdb -A— look forALLOWED DISRUPTIONS: 0 - Compare
minAvailableagainst actual replica count - Check for single-replica deployments with PDBs
- Look at
currentHealthyvsdesiredHealthyin PDB status - For EKS upgrades, check for cordoned nodes stuck in
SchedulingDisabled - Emergency: backup PDB → delete → drain → recreate
If you want to master Kubernetes operations and avoid these pitfalls entirely, KodeKloud has hands-on labs that walk you through PDB configurations, cluster upgrades, and node maintenance scenarios. It's the fastest way to build real muscle memory for these situations. For practicing in a live environment, DigitalOcean offers affordable managed Kubernetes clusters that are perfect for testing drain operations without risking production.
Final Thoughts
PodDisruptionBudgets are a critical safety mechanism — they exist to protect your application availability during maintenance. But a misconfigured PDB is worse than no PDB at all, because it creates the illusion of protection while actually blocking necessary operations.
The fix is almost always simple: use maxUnavailable: 1 instead of minAvailable, never attach PDBs to single-replica workloads, and monitor for PDBs with zero allowed disruptions. Do those three things, and you'll never have a stuck drain again.
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
AWS EKS Pods Stuck in Pending State: Causes and Fixes
Pods stuck in Pending on EKS are caused by a handful of known issues — insufficient node capacity, taint mismatches, PVC problems, and more. Here's how to diagnose and fix each one.
cert-manager Certificate Not Ready: Causes and Fixes
cert-manager Certificate stuck in a non-Ready state is a common Kubernetes TLS issue. This guide covers every root cause — DNS challenges, RBAC, rate limits, and issuer problems — with step-by-step fixes.
CI/CD Pipeline Is Broken: How to Debug and Fix GitHub Actions, Jenkins & ArgoCD Failures (2026)
Your CI/CD pipeline failed and you don't know why. This complete debugging guide covers GitHub Actions, Jenkins, and ArgoCD failures with real error messages and step-by-step fixes.