🎉 DevOps Interview Prep Bundle is live — 1000+ Q&A across 20 topicsGet it →
All Articles

Kubernetes HPA Not Scaling — How to Fix It (2026)

Your Horizontal Pod Autoscaler is configured but pods aren't scaling up or down. Here's every reason HPA fails and the exact fix for each.

DevOpsBoysApr 6, 20264 min read
Share:Tweet

You set up HPA expecting it to scale automatically. Load spikes. Nothing happens. Or it scales up but never scales down. Here's what's wrong and how to fix it.


Step 1: Check HPA Status

bash
kubectl get hpa -n your-namespace
kubectl describe hpa your-hpa -n your-namespace

Look for the Conditions section and Events. The describe output tells you exactly why HPA isn't working.


Cause 1: Metrics Server Not Installed

HPA needs the Metrics Server to get CPU and memory metrics. Without it, HPA shows <unknown>:

NAME   REFERENCE         TARGETS         MINPODS   MAXPODS   REPLICAS
myapp  Deployment/myapp  <unknown>/50%   2         10        2

Check:

bash
kubectl get deployment metrics-server -n kube-system
kubectl top pods -n your-namespace  # If this fails, metrics-server is missing

Fix — Install Metrics Server:

bash
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

For clusters where nodes don't have proper TLS certificates (local clusters):

bash
# Add --kubelet-insecure-tls to metrics-server deployment args
kubectl patch deployment metrics-server -n kube-system \
  --type='json' \
  -p='[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--kubelet-insecure-tls"}]'

Cause 2: No Resource Requests Set

HPA scales based on CPU/memory utilization percentage. If your pods don't have resource requests set, HPA can't calculate utilization — it has nothing to compare against.

bash
kubectl describe hpa your-hpa -n your-namespace
# "failed to get cpu utilization: missing request for cpu"

Fix — add resource requests to your Deployment:

yaml
resources:
  requests:
    cpu: "250m"
    memory: "256Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"

Without requests, HPA literally cannot work.


Cause 3: HPA Configured Wrong

yaml
# Wrong — targets the wrong metric name
spec:
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 50  # Scale when CPU > 50%

Common mistakes:

  • averageUtilization: 50 means 50% of the requested CPU, not the limit
  • Using AverageValue instead of Utilization requires absolute values (e.g., 500m)

Verify HPA is reading metrics:

bash
kubectl get hpa your-hpa -n your-namespace
# TARGETS should show: 25%/50% (current/target) not <unknown>/50%

Cause 4: Not Scaling Up — Load Not High Enough

HPA only scales up when average utilization across all pods exceeds the target. One pod at 100% won't trigger scale-up if other pods are at 10%.

bash
# Check per-pod CPU usage
kubectl top pods -n your-namespace -l app=myapp

Also check the HPA scale-up cooldown:

yaml
spec:
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0   # Default 0 — scales up immediately
      policies:
        - type: Percent
          value: 100
          periodSeconds: 15

Cause 5: Not Scaling Down — Stabilization Window

HPA has a default 5-minute stabilization window for scale-down to prevent flapping. After load drops, it waits 5 minutes before reducing pods.

bash
kubectl describe hpa your-hpa | grep "ScaleDown"
# "ScaleDown: allowed in ... s"

If you want faster scale-down:

yaml
spec:
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 60   # Reduce to 1 minute
      policies:
        - type: Pods
          value: 1
          periodSeconds: 60

Cause 6: Already at minReplicas or maxReplicas

HPA can't scale below minReplicas or above maxReplicas. Check:

bash
kubectl get hpa your-hpa -n your-namespace
# If REPLICAS = MAXPODS, you've hit the ceiling

Fix: Increase maxReplicas or check if your app is memory-leaking (causing permanently high CPU).


Cause 7: Custom Metrics Not Working (KEDA/Prometheus Adapter)

If you're scaling on custom metrics (queue depth, request count), you need either Prometheus Adapter or KEDA.

bash
# Check if custom metrics are available
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .

If the API doesn't exist — install Prometheus Adapter or KEDA.

KEDA is much simpler for custom metrics:

yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: myapp-scaler
spec:
  scaleTargetRef:
    name: myapp
  minReplicaCount: 0
  maxReplicaCount: 20
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://prometheus:9090
        metricName: http_requests_total
        threshold: "100"
        query: sum(rate(http_requests_total{app="myapp"}[1m]))

Quick Debug Commands

bash
# HPA status + conditions
kubectl describe hpa myapp -n production
 
# Are metrics available?
kubectl top pods -n production
 
# Events explaining why HPA isn't acting
kubectl get events -n production | grep HPA
 
# Current resource usage vs requests
kubectl describe pod <pod> -n production | grep -A 4 "Requests\|Limits"

HPA Best Practices

  1. Always set resources.requests — HPA can't work without them
  2. Set minReplicas >= 2 for production — single pod = single point of failure
  3. Don't set maxReplicas too low — defeats the purpose
  4. Use scaleDown.stabilizationWindowSeconds to prevent flapping
  5. For queue-based scaling (SQS, Kafka), use KEDA instead of HPA
Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments