🎉 DevOps Interview Prep Bundle is live — 1000+ Q&A across 20 topicsGet it →
All Articles

What Are Resource Limits and Requests in Kubernetes? (2026)

Kubernetes requests and limits control how much CPU and memory a pod gets. Get them wrong and your pods get throttled, OOMKilled, or evicted. Here's how they actually work.

DevOpsBoysApr 29, 20264 min read
Share:Tweet

Resource requests and limits are one of the most important — and most misunderstood — settings in Kubernetes. Get them wrong and your pods get throttled, killed, or evict other pods.


The Simple Version

Request = what your pod is guaranteed to get. Kubernetes uses this to decide which node to schedule the pod on.

Limit = the maximum your pod is allowed to use. If it goes over, bad things happen.

yaml
resources:
  requests:
    memory: "128Mi"
    cpu: "250m"
  limits:
    memory: "256Mi"
    cpu: "500m"

What Happens Without Them

If you don't set requests and limits:

  • Kubernetes can't schedule pods intelligently — it has no idea what resources they need
  • One runaway pod can eat all CPU on a node and starve other pods
  • During a memory pressure event, your pods are the first to be evicted
  • Your cluster will feel unpredictably slow

Always set them. No exceptions.


CPU: Requests vs Limits in Detail

CPU is measured in millicores. 1000m = 1 CPU core.

CPU Request:

  • Kubernetes places your pod on a node with at least this much CPU available
  • If the node has spare CPU, your pod can use more than its request
  • This is the scheduling guarantee

CPU Limit:

  • Your pod cannot use more CPU than this
  • If it tries to use more, the kernel throttles it — it just runs slower
  • CPU throttling is silent and hard to detect
yaml
resources:
  requests:
    cpu: "250m"    # needs at least 0.25 cores to schedule
  limits:
    cpu: "500m"    # will be throttled if it tries to use more than 0.5 cores

Common mistake: Setting CPU limits too low causes mysterious slowness in your app because it's constantly being throttled.

To check CPU throttling:

bash
# Look at container_cpu_cfs_throttled_seconds_total in Prometheus
kubectl top pods -n your-namespace

Memory: Requests vs Limits in Detail

Memory is measured in bytes. Use Mi (mebibytes) or Gi (gibibytes).

Memory Request:

  • Scheduling guarantee — pod goes to a node with at least this much free memory

Memory Limit:

  • Hard ceiling — if your pod tries to use more, the kernel kills it with OOMKilled
  • There is no "soft" memory limit like CPU throttling — it just gets killed
yaml
resources:
  requests:
    memory: "128Mi"   # needs 128MiB free to schedule
  limits:
    memory: "256Mi"   # will be OOMKilled if it uses more than 256MiB

OOMKilled = Out Of Memory Killed. Your pod restarts. If it happens repeatedly, you get CrashLoopBackOff.

bash
# Check if a pod was OOMKilled
kubectl describe pod <pod-name> | grep -A5 "OOMKilled"
kubectl get events --field-selector reason=OOMKilling

How Kubernetes Uses Requests for Scheduling

When you create a pod, the scheduler finds a node where:

  • Node allocatable CPU - Sum of all pod CPU requests on that node >= New pod's CPU request
  • Node allocatable memory - Sum of all pod memory requests >= New pod's memory request

Important: Kubernetes schedules based on requests, not actual usage. A node can look "full" to the scheduler even if pods are using much less.

bash
# See how much capacity is allocated on each node
kubectl describe nodes | grep -A5 "Allocated resources"

QoS Classes — How Kubernetes Prioritizes Eviction

Kubernetes assigns every pod a Quality of Service class based on its resources:

Guaranteed — requests = limits for all containers. Safest. Evicted last.

yaml
resources:
  requests:
    memory: "256Mi"
    cpu: "500m"
  limits:
    memory: "256Mi"
    cpu: "500m"

Burstable — limits > requests, or only limits set. Middle priority.

yaml
resources:
  requests:
    memory: "128Mi"
  limits:
    memory: "256Mi"

BestEffort — no requests or limits set at all. Evicted first under memory pressure.

During a memory crunch, Kubernetes evicts BestEffort pods first, then Burstable, then Guaranteed. This is why production pods should always be Guaranteed or Burstable.


If you don't know where to start, deploy without limits first, monitor actual usage, then set:

  • Request = P50 (median) actual usage
  • Limit = P99 actual usage with 20% headroom
bash
# Check actual resource usage with kubectl top
kubectl top pods -n production --sort-by=memory
 
# Or use Vertical Pod Autoscaler in recommendation mode
kubectl describe vpa <vpa-name>

LimitRange — Set Defaults for a Namespace

Instead of setting limits on every pod, set a namespace default:

yaml
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: my-app
spec:
  limits:
  - default:
      cpu: "500m"
      memory: "256Mi"
    defaultRequest:
      cpu: "100m"
      memory: "128Mi"
    type: Container

Now any pod deployed without explicit resource settings gets these defaults.


Key takeaways:

  • Requests = scheduling guarantee, limits = hard ceiling
  • CPU over limit = throttled (slow), memory over limit = killed (OOMKilled)
  • No resources set = BestEffort = first to be evicted
  • Set requests based on actual P50 usage, limits on P99 + headroom
Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments