Kubernetes Node DiskPressure Fix (2026)

Node shows DiskPressure condition and pods are getting evicted? Here's how to find what's eating disk space and fix it permanently.

DiskPressure on a Kubernetes node means the node is running out of disk space, and the kubelet starts evicting pods to reclaim space. Here's how to fix it.

Symptoms

bash

# Node shows DiskPressure condition
kubectl get nodes
# NAME         STATUS   ROLES   AGE   VERSION
# node-1       Ready    <none>  10d   v1.29.0  ← may show NotReady during pressure
 
kubectl describe node node-1 | grep -A10 "Conditions:"
# DiskPressure   True   ...   kubelet has disk pressure
 
# Pods being evicted with message
# The node was low on resource: ephemeral-storage
# Threshold quantity: 10%, available: 4%

Step 1: Find What's Using Disk

SSH into the affected node:

bash

# Get node's EC2/VM IP
kubectl get node node-1 -o wide
 
# SSH in
ssh -i key.pem ec2-user@<node-ip>
 
# Check overall disk usage
df -h
 
# Find large directories
du -sh /* 2>/dev/null | sort -rh | head -20
du -sh /var/lib/docker/* 2>/dev/null | sort -rh | head -10
du -sh /var/lib/containerd/* 2>/dev/null | sort -rh | head -10

Common culprits:

/var/lib/docker or /var/lib/containerd — container images and layers
/var/log — pod and system logs
/var/lib/kubelet/pods — pod ephemeral storage
Application data written to the node filesystem instead of PVCs

Fix 1: Clean Up Unused Container Images

This is the most common cause — old images accumulate on nodes.

bash

# On the node directly
# For Docker runtime:
docker image prune -a --force
 
# For containerd:
crictl images
crictl rmi --prune
 
# Check space freed
df -h

Kubernetes has garbage collection for images, but it triggers only when thresholds are crossed. The defaults are:

Start GC at 85% disk usage
Target 80% after GC

If your node fills up faster than GC can clean, lower the thresholds:

yaml

# In kubelet config (/etc/kubernetes/kubelet-config.yaml or equivalent)
imageGCHighThresholdPercent: 75  # trigger at 75% (default 85)
imageGCLowThresholdPercent: 70   # target 70% after GC (default 80)

Or in the kubelet args:

--image-gc-high-threshold=75
--image-gc-low-threshold=70

Fix 2: Clean Up Stopped Containers

Stopped containers accumulate and hold disk space:

bash

# List stopped containers
crictl ps --state EXITED
 
# Remove all stopped containers
crictl rm $(crictl ps --state EXITED -q)
 
# For Docker:
docker container prune -f

Fix 3: Pod Log Rotation

Pods writing large logs fill /var/log/pods:

bash

# Check log sizes
du -sh /var/log/pods/* | sort -rh | head -10
 
# Find which pod is writing most logs
du -sh /var/log/pods/*/ | sort -rh | head -5

Fix the application to log less, or configure log rotation in containerd:

json

// /etc/docker/daemon.json (for Docker runtime)
{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m",
    "max-file": "3"
  }
}

For containerd, set in /etc/containerd/config.toml:

toml

[plugins."io.containerd.grpc.v1.cri".containerd]
  snapshotter = "overlayfs"
 
[plugins."io.containerd.grpc.v1.cri"]
  max_container_log_line_size = 16384

Fix 4: Ephemeral Storage Limits on Pods

If a specific pod is filling disk with its own data:

yaml

resources:
  limits:
    ephemeral-storage: 2Gi    # max local disk usage for this pod
  requests:
    ephemeral-storage: 500Mi

When a pod exceeds its ephemeral-storage limit, it's evicted cleanly — instead of filling the node and evicting everything.

Fix 5: Increase Node Disk Size

If you're consistently hitting disk pressure, the node volume is undersized.

AWS EKS — resize existing node group:

bash

# Increase EBS volume in Terraform
resource "aws_eks_node_group" "main" {
  ...
  launch_template {
    id      = aws_launch_template.nodes.id
    version = aws_launch_template.nodes.latest_version
  }
}
 
resource "aws_launch_template" "nodes" {
  ...
  block_device_mappings {
    device_name = "/dev/xvda"
    ebs {
      volume_size = 50  # increase from 20 to 50
      volume_type = "gp3"
    }
  }
}

Replace node group or use in-place resize (EKS supports this for gp3 volumes).

Fix 6: Move Logs to External Storage

For high-log workloads, use a DaemonSet to ship logs off-node before they fill disk:

yaml

# Fluent Bit DaemonSet sends logs to CloudWatch/Loki
# Reduces disk usage to near-zero for log data
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: logging
spec:
  selector:
    matchLabels:
      app: fluent-bit
  template:
    ...

With log shipping to CloudWatch or Loki, you can also reduce log rotation window on nodes to keep disk usage low.

Prevent Recurrence: Set Up Alerts

yaml

# Prometheus alert for disk pressure
- alert: NodeDiskPressureWarning
  expr: |
    (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) < 0.20
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Node {{ $labels.instance }} disk below 20%"
 
- alert: NodeDiskPressureCritical
  expr: |
    (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) < 0.10
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: "Node {{ $labels.instance }} disk below 10% — evictions imminent"

Alert at 20% free so you have time to act before the kubelet starts evicting pods.

Debugging Checklist

bash

# 1. Check which nodes have DiskPressure
kubectl get nodes -o custom-columns=NAME:.metadata.name,CONDITIONS:.status.conditions[*].type
 
# 2. SSH to node and check disk
df -h
du -sh /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/ 
 
# 3. Check kubelet logs for eviction messages
journalctl -u kubelet | grep -i "evict\|disk\|pressure" | tail -30
 
# 4. Check which pods were evicted
kubectl get pods -A --field-selector=status.phase=Failed | grep Evicted
 
# 5. Clean images
crictl rmi --prune

Quick summary:

Immediate fix → crictl rmi --prune to free image cache
Persistent issue → add ephemeral-storage limits to pods, configure log rotation
Structural fix → increase node disk size or add log shipping DaemonSet
Prevention → alert at 20% free disk, before evictions start

Kubernetes Node DiskPressure Fix (2026)

Symptoms

Step 1: Find What's Using Disk

Fix 1: Clean Up Unused Container Images

Fix 2: Clean Up Stopped Containers

Fix 3: Pod Log Rotation

Fix 4: Ephemeral Storage Limits on Pods

Fix 5: Increase Node Disk Size

Fix 6: Move Logs to External Storage

Prevent Recurrence: Set Up Alerts

Debugging Checklist

Stay ahead of the curve

Related Articles

AWS EKS Pods Stuck in Pending State: Causes and Fixes

AWS EKS Worker Nodes Not Joining the Cluster: Complete Fix Guide

AWS RDS Connection Timeout from EKS Pods — How to Fix It

Comments