All Articles

Kubernetes PDB Blocking Node Drain — How to Fix It Without Breaking Availability

Fix PodDisruptionBudget misconfigurations that block kubectl drain during cluster upgrades, node maintenance, and autoscaler operations. Real scenarios and step-by-step solutions.

DevOpsBoysMar 27, 20267 min read
Share:Tweet

You run kubectl drain node-3 --ignore-daemonsets --delete-emptydir-data and it just hangs. No errors, no progress. You wait five minutes, ten minutes — nothing. You check the output and see:

evicting pod default/my-app-7b4f6d8c9-x2k4m
error when evicting pods/"my-app-7b4f6d8c9-x2k4m" -n "default" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

Congratulations — you have a PodDisruptionBudget (PDB) blocking your drain. This is one of the most common and frustrating issues during Kubernetes cluster upgrades, node maintenance windows, and Cluster Autoscaler scale-down events. Let's fix it.

What Is a PodDisruptionBudget?

A PDB tells Kubernetes how many pods of a given workload must remain available (or how many can be unavailable) during voluntary disruptions. Voluntary disruptions include:

  • kubectl drain (node maintenance)
  • Cluster Autoscaler removing underutilized nodes
  • Cluster upgrades (EKS, GKE, AKS managed node group rolling updates)

PDBs do not protect against involuntary disruptions like node crashes, OOM kills, or hardware failures.

Here's a basic PDB:

yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
  namespace: default
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: my-app

This says: "At least 2 pods matching app: my-app must be running at all times during voluntary disruptions."

Sounds reasonable. Until it isn't.

The Three Scenarios That Block Everything

Scenario 1: minAvailable Equals Replica Count

This is the most common mistake. You have a Deployment with 3 replicas, and someone creates a PDB with minAvailable: 3.

yaml
# Deployment
spec:
  replicas: 3
 
# PDB
spec:
  minAvailable: 3

The math is simple: Kubernetes cannot evict any pod because evicting one would drop the available count to 2, violating the PDB's requirement of 3. The drain will hang forever.

How to spot it:

bash
kubectl get pdb -A

Output:

NAMESPACE   NAME         MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
default     my-app-pdb   3               N/A               0                     30d

See that ALLOWED DISRUPTIONS: 0? That's your red flag. Zero allowed disruptions means nothing can be evicted.

The fix:

Option A — Lower minAvailable:

yaml
spec:
  minAvailable: 2  # For 3 replicas, allow 1 to be evicted

Option B — Use maxUnavailable instead:

yaml
spec:
  maxUnavailable: 1  # Allow 1 pod to be down at a time

Option C — Use percentages:

yaml
spec:
  minAvailable: "66%"  # For 3 replicas, keeps 2 running

Scenario 2: Single-Replica Deployment with PDB

You have a single-replica Deployment and a PDB with minAvailable: 1. Again, the math doesn't work — you can't evict the only pod and still keep one available.

yaml
# Deployment
spec:
  replicas: 1
 
# PDB
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: my-singleton-app
bash
kubectl get pdb my-singleton-pdb
NAME              MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
my-singleton-pdb  1               N/A               0                     15d

Zero allowed disruptions again.

The fix:

For single-replica workloads, you have a few options:

Option A — Remove the PDB entirely. If you only have one replica, a PDB doesn't add real value:

bash
kubectl delete pdb my-singleton-pdb

Option B — Scale up to at least 2 replicas if the workload supports it:

yaml
spec:
  replicas: 2

Option C — Use maxUnavailable: 1 (this actually works for single replicas):

yaml
spec:
  maxUnavailable: 1

With maxUnavailable: 1, Kubernetes is told "1 pod can be unavailable," so it can evict the single pod during drain.

Scenario 3: EKS/GKE Managed Node Group Upgrade Stuck

This is where it gets really painful. You trigger an EKS managed node group update, and it gets stuck at "Updating" for hours. Behind the scenes, EKS is trying to drain old nodes to move pods to new ones, but PDBs are blocking the drain.

The kicker: with EKS managed node groups, you don't see the drain output directly. You have to dig.

How to diagnose:

bash
# Check node group update status
aws eks describe-update --name my-cluster --update-id <update-id>
 
# Check for nodes being drained
kubectl get nodes
# Look for nodes with SchedulingDisabled status
 
# Check PDBs across all namespaces
kubectl get pdb -A -o wide
 
# Look for pods stuck on the draining node
kubectl get pods -A --field-selector spec.nodeName=<draining-node-name>

The fix:

First, identify which PDB is blocking:

bash
kubectl get pdb -A | grep "0" | grep -v "ALLOWED"

This filters for PDBs with 0 allowed disruptions. Fix those PDBs using the approaches above, and the upgrade will proceed.

The Emergency Fix (When You Need to Drain Now)

Sometimes you're in an incident. A node is unhealthy, you need it drained now, and you can't wait for the PDB fix to propagate.

Option 1: Temporarily delete the PDB

bash
# Save the PDB first
kubectl get pdb my-app-pdb -o yaml > pdb-backup.yaml
 
# Delete it
kubectl delete pdb my-app-pdb
 
# Drain the node
kubectl drain node-3 --ignore-daemonsets --delete-emptydir-data
 
# Recreate the PDB
kubectl apply -f pdb-backup.yaml

Option 2: Use --disable-eviction flag (Kubernetes 1.28+)

bash
kubectl drain node-3 --ignore-daemonsets --delete-emptydir-data --disable-eviction

This bypasses the Eviction API entirely and deletes pods directly. The PDB is not consulted. Use this carefully — it provides zero availability protection.

Option 3: Force drain with timeout

bash
kubectl drain node-3 --ignore-daemonsets --delete-emptydir-data --timeout=120s --force

After the timeout, --force will delete pods that aren't managed by a ReplicationController, ReplicaSet, DaemonSet, StatefulSet, or Job. But note: this still respects PDBs for managed pods. The --force flag only affects unmanaged pods.

Cluster Autoscaler and PDB Conflicts

The Cluster Autoscaler has its own relationship with PDBs. When the autoscaler identifies an underutilized node for removal, it checks all PDBs. If any pod on the node has a PDB that would be violated by eviction, the autoscaler skips that node.

This leads to a common complaint: "My Cluster Autoscaler never scales down."

Diagnose it:

bash
# Check Cluster Autoscaler logs
kubectl logs -n kube-system -l app=cluster-autoscaler --tail=100 | grep "pdb"

You'll see messages like:

pod default/my-app-7b4f6d8c9-x2k4m can't be evicted: would violate PDB

Fix it by auditing all PDBs:

bash
# List all PDBs with their allowed disruptions
kubectl get pdb -A -o custom-columns=\
NAMESPACE:.metadata.namespace,\
NAME:.metadata.name,\
MIN-AVAILABLE:.spec.minAvailable,\
MAX-UNAVAILABLE:.spec.maxUnavailable,\
ALLOWED-DISRUPTIONS:.status.disruptionsAllowed,\
CURRENT-HEALTHY:.status.currentHealthy,\
DESIRED-HEALTHY:.status.desiredHealthy

Any row where ALLOWED-DISRUPTIONS is 0 and the workload is healthy is a misconfiguration.

PDB Best Practices

Here's what I recommend after dealing with PDB issues across dozens of clusters:

1. Always Use maxUnavailable Instead of minAvailable

maxUnavailable: 1 is almost always the right choice. It scales naturally with your replica count and doesn't create the "equals replica count" trap.

yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: my-app

2. Never Create PDBs for Single-Replica Workloads

If you only have one replica, a PDB with minAvailable: 1 will block all drains. Either scale up or don't use a PDB.

3. Add PDB Checks to Your CI/CD Pipeline

Use a policy engine to catch misconfigurations before they hit the cluster:

yaml
# Kyverno policy to prevent PDB with minAvailable >= replicas
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: prevent-pdb-misconfiguration
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-pdb-min-available
      match:
        any:
          - resources:
              kinds:
                - PodDisruptionBudget
      validate:
        message: "PDB minAvailable should be less than the number of replicas"
        deny:
          conditions:
            any:
              - key: "{{ request.object.spec.minAvailable }}"
                operator: Equals
                value: ""

4. Monitor PDB Status

Set up an alert for PDBs with zero allowed disruptions:

yaml
# Prometheus alert
- alert: PDBZeroDisruptionsAllowed
  expr: kube_poddisruptionbudget_status_pod_disruptions_allowed == 0
  for: 30m
  labels:
    severity: warning
  annotations:
    summary: "PDB {{ $labels.poddisruptionbudget }} in {{ $labels.namespace }} has 0 allowed disruptions"
    description: "This PDB will block node drains and cluster upgrades."

5. Use unhealthyPodEvictionPolicy

Kubernetes 1.31+ introduced unhealthyPodEvictionPolicy. Set it to AlwaysAllow to let unhealthy pods (CrashLoopBackOff, etc.) be evicted even when PDB budget is exhausted:

yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  maxUnavailable: 1
  unhealthyPodEvictionPolicy: AlwaysAllow
  selector:
    matchLabels:
      app: my-app

This prevents the nightmare scenario where crashed pods block drains because the PDB still counts them toward minAvailable.

Quick Reference: PDB Troubleshooting Checklist

  1. Run kubectl get pdb -A — look for ALLOWED DISRUPTIONS: 0
  2. Compare minAvailable against actual replica count
  3. Check for single-replica deployments with PDBs
  4. Look at currentHealthy vs desiredHealthy in PDB status
  5. For EKS upgrades, check for cordoned nodes stuck in SchedulingDisabled
  6. Emergency: backup PDB → delete → drain → recreate

If you want to master Kubernetes operations and avoid these pitfalls entirely, KodeKloud has hands-on labs that walk you through PDB configurations, cluster upgrades, and node maintenance scenarios. It's the fastest way to build real muscle memory for these situations. For practicing in a live environment, DigitalOcean offers affordable managed Kubernetes clusters that are perfect for testing drain operations without risking production.

Final Thoughts

PodDisruptionBudgets are a critical safety mechanism — they exist to protect your application availability during maintenance. But a misconfigured PDB is worse than no PDB at all, because it creates the illusion of protection while actually blocking necessary operations.

The fix is almost always simple: use maxUnavailable: 1 instead of minAvailable, never attach PDBs to single-replica workloads, and monitor for PDBs with zero allowed disruptions. Do those three things, and you'll never have a stuck drain again.

Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments