Kubernetes Node DiskPressure Fix (2026)
Node shows DiskPressure condition and pods are getting evicted? Here's how to find what's eating disk space and fix it permanently.
DiskPressure on a Kubernetes node means the node is running out of disk space, and the kubelet starts evicting pods to reclaim space. Here's how to fix it.
Symptoms
# Node shows DiskPressure condition
kubectl get nodes
# NAME STATUS ROLES AGE VERSION
# node-1 Ready <none> 10d v1.29.0 ← may show NotReady during pressure
kubectl describe node node-1 | grep -A10 "Conditions:"
# DiskPressure True ... kubelet has disk pressure
# Pods being evicted with message
# The node was low on resource: ephemeral-storage
# Threshold quantity: 10%, available: 4%Step 1: Find What's Using Disk
SSH into the affected node:
# Get node's EC2/VM IP
kubectl get node node-1 -o wide
# SSH in
ssh -i key.pem ec2-user@<node-ip>
# Check overall disk usage
df -h
# Find large directories
du -sh /* 2>/dev/null | sort -rh | head -20
du -sh /var/lib/docker/* 2>/dev/null | sort -rh | head -10
du -sh /var/lib/containerd/* 2>/dev/null | sort -rh | head -10Common culprits:
/var/lib/dockeror/var/lib/containerd— container images and layers/var/log— pod and system logs/var/lib/kubelet/pods— pod ephemeral storage- Application data written to the node filesystem instead of PVCs
Fix 1: Clean Up Unused Container Images
This is the most common cause — old images accumulate on nodes.
# On the node directly
# For Docker runtime:
docker image prune -a --force
# For containerd:
crictl images
crictl rmi --prune
# Check space freed
df -hKubernetes has garbage collection for images, but it triggers only when thresholds are crossed. The defaults are:
- Start GC at 85% disk usage
- Target 80% after GC
If your node fills up faster than GC can clean, lower the thresholds:
# In kubelet config (/etc/kubernetes/kubelet-config.yaml or equivalent)
imageGCHighThresholdPercent: 75 # trigger at 75% (default 85)
imageGCLowThresholdPercent: 70 # target 70% after GC (default 80)Or in the kubelet args:
--image-gc-high-threshold=75
--image-gc-low-threshold=70
Fix 2: Clean Up Stopped Containers
Stopped containers accumulate and hold disk space:
# List stopped containers
crictl ps --state EXITED
# Remove all stopped containers
crictl rm $(crictl ps --state EXITED -q)
# For Docker:
docker container prune -fFix 3: Pod Log Rotation
Pods writing large logs fill /var/log/pods:
# Check log sizes
du -sh /var/log/pods/* | sort -rh | head -10
# Find which pod is writing most logs
du -sh /var/log/pods/*/ | sort -rh | head -5Fix the application to log less, or configure log rotation in containerd:
// /etc/docker/daemon.json (for Docker runtime)
{
"log-driver": "json-file",
"log-opts": {
"max-size": "100m",
"max-file": "3"
}
}For containerd, set in /etc/containerd/config.toml:
[plugins."io.containerd.grpc.v1.cri".containerd]
snapshotter = "overlayfs"
[plugins."io.containerd.grpc.v1.cri"]
max_container_log_line_size = 16384Fix 4: Ephemeral Storage Limits on Pods
If a specific pod is filling disk with its own data:
resources:
limits:
ephemeral-storage: 2Gi # max local disk usage for this pod
requests:
ephemeral-storage: 500MiWhen a pod exceeds its ephemeral-storage limit, it's evicted cleanly — instead of filling the node and evicting everything.
Fix 5: Increase Node Disk Size
If you're consistently hitting disk pressure, the node volume is undersized.
AWS EKS — resize existing node group:
# Increase EBS volume in Terraform
resource "aws_eks_node_group" "main" {
...
launch_template {
id = aws_launch_template.nodes.id
version = aws_launch_template.nodes.latest_version
}
}
resource "aws_launch_template" "nodes" {
...
block_device_mappings {
device_name = "/dev/xvda"
ebs {
volume_size = 50 # increase from 20 to 50
volume_type = "gp3"
}
}
}Replace node group or use in-place resize (EKS supports this for gp3 volumes).
Fix 6: Move Logs to External Storage
For high-log workloads, use a DaemonSet to ship logs off-node before they fill disk:
# Fluent Bit DaemonSet sends logs to CloudWatch/Loki
# Reduces disk usage to near-zero for log data
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: logging
spec:
selector:
matchLabels:
app: fluent-bit
template:
...With log shipping to CloudWatch or Loki, you can also reduce log rotation window on nodes to keep disk usage low.
Prevent Recurrence: Set Up Alerts
# Prometheus alert for disk pressure
- alert: NodeDiskPressureWarning
expr: |
(node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) < 0.20
for: 5m
labels:
severity: warning
annotations:
summary: "Node {{ $labels.instance }} disk below 20%"
- alert: NodeDiskPressureCritical
expr: |
(node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) < 0.10
for: 2m
labels:
severity: critical
annotations:
summary: "Node {{ $labels.instance }} disk below 10% — evictions imminent"Alert at 20% free so you have time to act before the kubelet starts evicting pods.
Debugging Checklist
# 1. Check which nodes have DiskPressure
kubectl get nodes -o custom-columns=NAME:.metadata.name,CONDITIONS:.status.conditions[*].type
# 2. SSH to node and check disk
df -h
du -sh /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/
# 3. Check kubelet logs for eviction messages
journalctl -u kubelet | grep -i "evict\|disk\|pressure" | tail -30
# 4. Check which pods were evicted
kubectl get pods -A --field-selector=status.phase=Failed | grep Evicted
# 5. Clean images
crictl rmi --pruneQuick summary:
- Immediate fix →
crictl rmi --pruneto free image cache - Persistent issue → add
ephemeral-storagelimits to pods, configure log rotation - Structural fix → increase node disk size or add log shipping DaemonSet
- Prevention → alert at 20% free disk, before evictions start
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
AWS EKS Pods Stuck in Pending State: Causes and Fixes
Pods stuck in Pending on EKS are caused by a handful of known issues — insufficient node capacity, taint mismatches, PVC problems, and more. Here's how to diagnose and fix each one.
AWS EKS Worker Nodes Not Joining the Cluster: Complete Fix Guide
EKS worker nodes stuck in NotReady or not appearing at all? Here are all the causes and step-by-step fixes for node bootstrap failures.
AWS RDS Connection Timeout from EKS Pods — How to Fix It
EKS pods can't connect to RDS? Fix RDS connection timeouts from Kubernetes — covers security groups, VPC peering, subnet routing, and IAM auth issues.