Kubernetes Node DiskPressure Fix (2026)
Node shows DiskPressure condition and pods are getting evicted? Here's how to find what's eating disk space and fix it permanently.
DiskPressure on a Kubernetes node means the node is running out of disk space, and the kubelet starts evicting pods to reclaim space. Here's how to fix it.
Symptoms
# Node shows DiskPressure condition
kubectl get nodes
# NAME STATUS ROLES AGE VERSION
# node-1 Ready <none> 10d v1.29.0 ← may show NotReady during pressure
kubectl describe node node-1 | grep -A10 "Conditions:"
# DiskPressure True ... kubelet has disk pressure
# Pods being evicted with message
# The node was low on resource: ephemeral-storage
# Threshold quantity: 10%, available: 4%Step 1: Find What's Using Disk
SSH into the affected node:
# Get node's EC2/VM IP
kubectl get node node-1 -o wide
# SSH in
ssh -i key.pem ec2-user@<node-ip>
# Check overall disk usage
df -h
# Find large directories
du -sh /* 2>/dev/null | sort -rh | head -20
du -sh /var/lib/docker/* 2>/dev/null | sort -rh | head -10
du -sh /var/lib/containerd/* 2>/dev/null | sort -rh | head -10Common culprits:
/var/lib/dockeror/var/lib/containerd— container images and layers/var/log— pod and system logs/var/lib/kubelet/pods— pod ephemeral storage- Application data written to the node filesystem instead of PVCs
Fix 1: Clean Up Unused Container Images
This is the most common cause — old images accumulate on nodes.
# On the node directly
# For Docker runtime:
docker image prune -a --force
# For containerd:
crictl images
crictl rmi --prune
# Check space freed
df -hKubernetes has garbage collection for images, but it triggers only when thresholds are crossed. The defaults are:
- Start GC at 85% disk usage
- Target 80% after GC
If your node fills up faster than GC can clean, lower the thresholds:
# In kubelet config (/etc/kubernetes/kubelet-config.yaml or equivalent)
imageGCHighThresholdPercent: 75 # trigger at 75% (default 85)
imageGCLowThresholdPercent: 70 # target 70% after GC (default 80)Or in the kubelet args:
--image-gc-high-threshold=75
--image-gc-low-threshold=70
Fix 2: Clean Up Stopped Containers
Stopped containers accumulate and hold disk space:
# List stopped containers
crictl ps --state EXITED
# Remove all stopped containers
crictl rm $(crictl ps --state EXITED -q)
# For Docker:
docker container prune -fFix 3: Pod Log Rotation
Pods writing large logs fill /var/log/pods:
# Check log sizes
du -sh /var/log/pods/* | sort -rh | head -10
# Find which pod is writing most logs
du -sh /var/log/pods/*/ | sort -rh | head -5Fix the application to log less, or configure log rotation in containerd:
// /etc/docker/daemon.json (for Docker runtime)
{
"log-driver": "json-file",
"log-opts": {
"max-size": "100m",
"max-file": "3"
}
}For containerd, set in /etc/containerd/config.toml:
[plugins."io.containerd.grpc.v1.cri".containerd]
snapshotter = "overlayfs"
[plugins."io.containerd.grpc.v1.cri"]
max_container_log_line_size = 16384Fix 4: Ephemeral Storage Limits on Pods
If a specific pod is filling disk with its own data:
resources:
limits:
ephemeral-storage: 2Gi # max local disk usage for this pod
requests:
ephemeral-storage: 500MiWhen a pod exceeds its ephemeral-storage limit, it's evicted cleanly — instead of filling the node and evicting everything.
Fix 5: Increase Node Disk Size
If you're consistently hitting disk pressure, the node volume is undersized.
AWS EKS — resize existing node group:
# Increase EBS volume in Terraform
resource "aws_eks_node_group" "main" {
...
launch_template {
id = aws_launch_template.nodes.id
version = aws_launch_template.nodes.latest_version
}
}
resource "aws_launch_template" "nodes" {
...
block_device_mappings {
device_name = "/dev/xvda"
ebs {
volume_size = 50 # increase from 20 to 50
volume_type = "gp3"
}
}
}Replace node group or use in-place resize (EKS supports this for gp3 volumes).
Fix 6: Move Logs to External Storage
For high-log workloads, use a DaemonSet to ship logs off-node before they fill disk:
# Fluent Bit DaemonSet sends logs to CloudWatch/Loki
# Reduces disk usage to near-zero for log data
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: logging
spec:
selector:
matchLabels:
app: fluent-bit
template:
...With log shipping to CloudWatch or Loki, you can also reduce log rotation window on nodes to keep disk usage low.
Prevent Recurrence: Set Up Alerts
# Prometheus alert for disk pressure
- alert: NodeDiskPressureWarning
expr: |
(node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) < 0.20
for: 5m
labels:
severity: warning
annotations:
summary: "Node {{ $labels.instance }} disk below 20%"
- alert: NodeDiskPressureCritical
expr: |
(node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) < 0.10
for: 2m
labels:
severity: critical
annotations:
summary: "Node {{ $labels.instance }} disk below 10% — evictions imminent"Alert at 20% free so you have time to act before the kubelet starts evicting pods.
Debugging Checklist
# 1. Check which nodes have DiskPressure
kubectl get nodes -o custom-columns=NAME:.metadata.name,CONDITIONS:.status.conditions[*].type
# 2. SSH to node and check disk
df -h
du -sh /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/
# 3. Check kubelet logs for eviction messages
journalctl -u kubelet | grep -i "evict\|disk\|pressure" | tail -30
# 4. Check which pods were evicted
kubectl get pods -A --field-selector=status.phase=Failed | grep Evicted
# 5. Clean images
crictl rmi --pruneQuick summary:
- Immediate fix →
crictl rmi --pruneto free image cache - Persistent issue → add
ephemeral-storagelimits to pods, configure log rotation - Structural fix → increase node disk size or add log shipping DaemonSet
- Prevention → alert at 20% free disk, before evictions start
Today I Fixed
Short real fixes from production — posted daily
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
ArgoCD App of Apps Not Syncing — Every Fix (2026)
Your ArgoCD App of Apps pattern stopped syncing. Child apps aren't created, parent shows OutOfSync, or sync is stuck. Here are every cause and the exact fix.
ArgoCD Image Updater Not Syncing — Fix Guide
ArgoCD Image Updater detects a new image tag but doesn't update the Application. Here's how to diagnose and fix annotation errors, registry auth issues, write-back problems, and sync failures.
ArgoCD Resource Hook Failed: How to Debug and Fix It
ArgoCD PreSync or PostSync hooks failing silently? Here's how to find the real error, fix hook job issues, and stop your deployments from getting stuck.