🎉 DevOps Interview Prep Bundle is live — 1000+ Q&A across 20 topicsGet it →
All Articles

Kubernetes Node DiskPressure Fix (2026)

Node shows DiskPressure condition and pods are getting evicted? Here's how to find what's eating disk space and fix it permanently.

DevOpsBoysMay 8, 20264 min read
Share:Tweet

DiskPressure on a Kubernetes node means the node is running out of disk space, and the kubelet starts evicting pods to reclaim space. Here's how to fix it.


Symptoms

bash
# Node shows DiskPressure condition
kubectl get nodes
# NAME         STATUS   ROLES   AGE   VERSION
# node-1       Ready    <none>  10d   v1.29.0  ← may show NotReady during pressure
 
kubectl describe node node-1 | grep -A10 "Conditions:"
# DiskPressure   True   ...   kubelet has disk pressure
 
# Pods being evicted with message
# The node was low on resource: ephemeral-storage
# Threshold quantity: 10%, available: 4%

Step 1: Find What's Using Disk

SSH into the affected node:

bash
# Get node's EC2/VM IP
kubectl get node node-1 -o wide
 
# SSH in
ssh -i key.pem ec2-user@<node-ip>
 
# Check overall disk usage
df -h
 
# Find large directories
du -sh /* 2>/dev/null | sort -rh | head -20
du -sh /var/lib/docker/* 2>/dev/null | sort -rh | head -10
du -sh /var/lib/containerd/* 2>/dev/null | sort -rh | head -10

Common culprits:

  • /var/lib/docker or /var/lib/containerd — container images and layers
  • /var/log — pod and system logs
  • /var/lib/kubelet/pods — pod ephemeral storage
  • Application data written to the node filesystem instead of PVCs

Fix 1: Clean Up Unused Container Images

This is the most common cause — old images accumulate on nodes.

bash
# On the node directly
# For Docker runtime:
docker image prune -a --force
 
# For containerd:
crictl images
crictl rmi --prune
 
# Check space freed
df -h

Kubernetes has garbage collection for images, but it triggers only when thresholds are crossed. The defaults are:

  • Start GC at 85% disk usage
  • Target 80% after GC

If your node fills up faster than GC can clean, lower the thresholds:

yaml
# In kubelet config (/etc/kubernetes/kubelet-config.yaml or equivalent)
imageGCHighThresholdPercent: 75  # trigger at 75% (default 85)
imageGCLowThresholdPercent: 70   # target 70% after GC (default 80)

Or in the kubelet args:

--image-gc-high-threshold=75
--image-gc-low-threshold=70

Fix 2: Clean Up Stopped Containers

Stopped containers accumulate and hold disk space:

bash
# List stopped containers
crictl ps --state EXITED
 
# Remove all stopped containers
crictl rm $(crictl ps --state EXITED -q)
 
# For Docker:
docker container prune -f

Fix 3: Pod Log Rotation

Pods writing large logs fill /var/log/pods:

bash
# Check log sizes
du -sh /var/log/pods/* | sort -rh | head -10
 
# Find which pod is writing most logs
du -sh /var/log/pods/*/ | sort -rh | head -5

Fix the application to log less, or configure log rotation in containerd:

json
// /etc/docker/daemon.json (for Docker runtime)
{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m",
    "max-file": "3"
  }
}

For containerd, set in /etc/containerd/config.toml:

toml
[plugins."io.containerd.grpc.v1.cri".containerd]
  snapshotter = "overlayfs"
 
[plugins."io.containerd.grpc.v1.cri"]
  max_container_log_line_size = 16384

Fix 4: Ephemeral Storage Limits on Pods

If a specific pod is filling disk with its own data:

yaml
resources:
  limits:
    ephemeral-storage: 2Gi    # max local disk usage for this pod
  requests:
    ephemeral-storage: 500Mi

When a pod exceeds its ephemeral-storage limit, it's evicted cleanly — instead of filling the node and evicting everything.


Fix 5: Increase Node Disk Size

If you're consistently hitting disk pressure, the node volume is undersized.

AWS EKS — resize existing node group:

bash
# Increase EBS volume in Terraform
resource "aws_eks_node_group" "main" {
  ...
  launch_template {
    id      = aws_launch_template.nodes.id
    version = aws_launch_template.nodes.latest_version
  }
}
 
resource "aws_launch_template" "nodes" {
  ...
  block_device_mappings {
    device_name = "/dev/xvda"
    ebs {
      volume_size = 50  # increase from 20 to 50
      volume_type = "gp3"
    }
  }
}

Replace node group or use in-place resize (EKS supports this for gp3 volumes).


Fix 6: Move Logs to External Storage

For high-log workloads, use a DaemonSet to ship logs off-node before they fill disk:

yaml
# Fluent Bit DaemonSet sends logs to CloudWatch/Loki
# Reduces disk usage to near-zero for log data
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: logging
spec:
  selector:
    matchLabels:
      app: fluent-bit
  template:
    ...

With log shipping to CloudWatch or Loki, you can also reduce log rotation window on nodes to keep disk usage low.


Prevent Recurrence: Set Up Alerts

yaml
# Prometheus alert for disk pressure
- alert: NodeDiskPressureWarning
  expr: |
    (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) < 0.20
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Node {{ $labels.instance }} disk below 20%"
 
- alert: NodeDiskPressureCritical
  expr: |
    (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) < 0.10
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: "Node {{ $labels.instance }} disk below 10% — evictions imminent"

Alert at 20% free so you have time to act before the kubelet starts evicting pods.


Debugging Checklist

bash
# 1. Check which nodes have DiskPressure
kubectl get nodes -o custom-columns=NAME:.metadata.name,CONDITIONS:.status.conditions[*].type
 
# 2. SSH to node and check disk
df -h
du -sh /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/ 
 
# 3. Check kubelet logs for eviction messages
journalctl -u kubelet | grep -i "evict\|disk\|pressure" | tail -30
 
# 4. Check which pods were evicted
kubectl get pods -A --field-selector=status.phase=Failed | grep Evicted
 
# 5. Clean images
crictl rmi --prune

Quick summary:

  • Immediate fix → crictl rmi --prune to free image cache
  • Persistent issue → add ephemeral-storage limits to pods, configure log rotation
  • Structural fix → increase node disk size or add log shipping DaemonSet
  • Prevention → alert at 20% free disk, before evictions start
Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments