All Articles

Kubernetes HPA Not Scaling: Why Your Pods Refuse to Scale and How to Fix It

Kubernetes HPA not scaling your pods? Walk through every root cause — missing metrics, wrong resource requests, cooldown periods — and fix each one systematically.

DevOpsBoysMar 30, 20265 min read
Share:Tweet

Your Horizontal Pod Autoscaler is configured. Your traffic is spiking. And your pods... just sit there. HPA not scaling is one of the most frustrating Kubernetes issues because there are at least 8 different root causes and each one looks the same from the outside.

This guide walks through every reason HPA fails to scale and exactly how to fix it.

The Fast Diagnostic Command

Before anything else, run this:

bash
kubectl describe hpa <hpa-name> -n <namespace>

Look at the Conditions and Events sections. This alone will tell you 80% of the time why HPA is stuck.

Common condition messages:

  • unable to get metrics for resource cpu → metrics-server missing or broken
  • invalid metrics (1 invalid out of 1) → resource requests not set
  • DesiredReplicas below minimum → already at min replicas, nothing to scale down
  • ScalingActive=false → HPA is disabled or misconfigured

Root Cause 1: Metrics Server Not Installed

HPA needs metrics-server to read CPU and memory. If it's not installed, HPA can't do anything.

Check:

bash
kubectl top pods -n <namespace>

If you get error: Metrics API not available, metrics-server is missing.

Fix:

bash
helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/
helm install metrics-server metrics-server/metrics-server \
  --namespace kube-system \
  --set args[0]="--kubelet-insecure-tls"

Wait 60 seconds, then verify:

bash
kubectl top nodes
kubectl top pods -n <namespace>

Affiliate tip: KodeKloud has excellent Kubernetes labs where you practice HPA hands-on in real clusters.

Root Cause 2: Resource Requests Not Set

HPA calculates utilization as actual_usage / requested_amount. If you haven't set resource requests on your containers, the denominator is zero — HPA can't compute utilization and refuses to scale.

Check:

bash
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].resources}'

If this returns {} or shows no requests, that's your problem.

Fix — add resource requests to your deployment:

yaml
resources:
  requests:
    cpu: "100m"
    memory: "128Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"

Rule of thumb: set requests to your average load, limits to your peak load.

Root Cause 3: HPA Cooldown Periods

HPA has built-in cooldown to prevent thrashing. By default:

  • Scale-up cooldown: 0 seconds (scales up immediately)
  • Scale-down cooldown: 5 minutes (waits 5 min before scaling down)

If you scaled up recently and are now expecting scale-down, wait 5 minutes. It's not broken.

Check the last scale time:

bash
kubectl describe hpa <hpa-name> | grep "Last Scale Time"

Adjust cooldown behavior (Kubernetes 1.18+):

yaml
spec:
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Pods
        value: 4
        periodSeconds: 15

Root Cause 4: Already at Min/Max Replicas

If HPA wants to scale down but you're at minReplicas, or wants to scale up but you're at maxReplicas, it simply won't do anything.

Check:

bash
kubectl get hpa <hpa-name> -n <namespace>

Look at the MINPODS, MAXPODS, and REPLICAS columns.

Fix: Adjust min/max to match your actual needs:

yaml
spec:
  minReplicas: 2
  maxReplicas: 20

Root Cause 5: CPU Usage Below Target Threshold

HPA only scales if currentUtilization / targetUtilization significantly exceeds 1.0. There's a 10% tolerance band — if target is 50% and actual is 55%, HPA won't scale.

Check current utilization:

bash
kubectl describe hpa <hpa-name> | grep -A5 "Metrics:"

If actual usage is close to target but within the tolerance band, that's by design. Lower your target:

yaml
spec:
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 40  # was 50, now lower

Root Cause 6: Wrong API Version

The old autoscaling/v1 HPA only supports CPU. For memory or custom metrics, you need autoscaling/v2.

Check what you're using:

bash
kubectl get hpa <hpa-name> -o yaml | grep "apiVersion"

Fix — upgrade to v2:

yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70

Root Cause 7: Custom Metrics Not Available (Prometheus Adapter)

If you're scaling on custom metrics (requests-per-second, queue depth, etc.), you need Prometheus Adapter or KEDA. Without it, HPA shows no metrics found.

Check:

bash
kubectl get apiservice v1beta1.custom.metrics.k8s.io

If it's missing, install KEDA instead — it's far easier than Prometheus Adapter:

bash
helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --namespace keda --create-namespace

Then use a ScaledObject instead of HPA for custom metric scaling.

Root Cause 8: Deployment Selector Mismatch

HPA targets a deployment by name. If the deployment was renamed or the target reference is wrong, HPA shows ScaleTargetRef must not be empty.

Check:

bash
kubectl describe hpa <hpa-name> | grep "Reference:"

Verify the deployment name matches exactly:

bash
kubectl get deployments -n <namespace>

Complete HPA Troubleshooting Checklist

bash
# 1. Check HPA status and events
kubectl describe hpa <name> -n <ns>
 
# 2. Verify metrics-server works
kubectl top pods -n <ns>
 
# 3. Check resource requests on pods
kubectl get pod <pod> -o jsonpath='{.spec.containers[*].resources}'
 
# 4. Check current vs desired replicas
kubectl get hpa <name> -n <ns>
 
# 5. Check pod events for resource issues
kubectl get events -n <ns> --sort-by='.lastTimestamp'

Production HPA Template

A solid HPA config for most web services:

yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 3
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 75
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 25
        periodSeconds: 60

Summary

ProblemFix
HPA shows unable to get metricsInstall metrics-server
invalid metrics errorSet resource requests on containers
HPA not scaling downWait for 5-min cooldown
Stuck at min/max replicasAdjust minReplicas/maxReplicas
CPU near target but no scaleLower averageUtilization target
Memory scaling not workingUse autoscaling/v2 API
Custom metrics not foundInstall KEDA or Prometheus Adapter

Fix the metrics-server and resource requests first — they cover 90% of HPA failures. Everything else is tuning.


Building a Kubernetes-heavy platform? DigitalOcean Kubernetes offers managed clusters with autoscaling built-in — great for production workloads.

Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments