All Articles

What is Kubernetes HPA? Horizontal Pod Autoscaler Explained Simply

HPA in Kubernetes explained from scratch — what it does, how it works, how to set it up, and common mistakes to avoid. No jargon.

DevOpsBoysApr 19, 20265 min read
Share:Tweet

Your app is running fine when 10 users hit it. But when 1,000 users arrive during a launch or sale, it crashes. Adding pods manually isn't realistic.

That's what HPA (Horizontal Pod Autoscaler) solves. It automatically scales the number of pods up or down based on load.

What Does HPA Actually Do?

HPA watches your pods and asks: "Is this workload under too much pressure?" If yes, it adds more pods. When load drops, it removes them.

Low traffic    → HPA sees CPU < 30%  → scales DOWN to 2 pods
Normal traffic → HPA sees CPU = 50%  → keeps 3 pods
High traffic   → HPA sees CPU = 90%  → scales UP to 8 pods

It does this automatically, every 15 seconds by default.

HPA vs VPA vs KEDA

Before going further — there are three autoscalers in Kubernetes:

What it scalesBased on
HPANumber of pod replicasCPU, memory, custom metrics
VPACPU/memory requests per podHistorical usage
KEDANumber of replicasExternal events (queue depth, Kafka lag, cron)

HPA = scale out (more pods). VPA = scale up (bigger pods). KEDA = scale from zero based on events.

This post covers HPA.

How HPA Works Internally

HPA talks to the Metrics Server to get CPU/memory usage. Every 15 seconds it calculates:

desiredReplicas = ceil( currentReplicas × (currentMetricValue / targetMetricValue) )

Example:

  • 3 pods running
  • Current CPU: 90%
  • Target CPU: 50%
  • Desired = ceil(3 × 90/50) = ceil(5.4) = 6 pods

HPA then tells the Deployment to scale to 6 replicas.

Prerequisites: Metrics Server

HPA needs the Metrics Server installed. Check if it's running:

bash
kubectl get deployment metrics-server -n kube-system

If it's not there:

bash
# Install Metrics Server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

For local clusters (minikube):

bash
minikube addons enable metrics-server

Verify it works:

bash
kubectl top pods
kubectl top nodes

If kubectl top pods shows data, you're ready.

Creating Your First HPA

Step 1 — Create a Deployment

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web
        image: nginx
        resources:
          requests:
            cpu: "100m"       # HPA needs requests set
            memory: "128Mi"
          limits:
            cpu: "500m"
            memory: "256Mi"

Important: HPA won't work without resources.requests.cpu set. It uses requests as the baseline for percentage calculations.

Step 2 — Create the HPA

Imperative (fast):

bash
kubectl autoscale deployment web-app \
  --cpu-percent=50 \
  --min=2 \
  --max=10

Declarative YAML (recommended):

yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50    # Target 50% CPU usage

Apply it:

bash
kubectl apply -f hpa.yaml

Step 3 — Check HPA Status

bash
kubectl get hpa
# NAME          REFERENCE            TARGETS   MINPODS   MAXPODS   REPLICAS
# web-app-hpa   Deployment/web-app   22%/50%   2         10        2
 
kubectl describe hpa web-app-hpa

The TARGETS column shows currentValue/targetValue. If it shows <unknown>/50%, Metrics Server isn't working.


Scale Based on Memory Too

yaml
metrics:
- type: Resource
  resource:
    name: cpu
    target:
      type: Utilization
      averageUtilization: 60
- type: Resource
  resource:
    name: memory
    target:
      type: Utilization
      averageUtilization: 70

HPA scales up if either CPU OR memory hits the target.


Testing HPA

Generate load to see HPA scale up:

bash
# In one terminal — run a load generator
kubectl run -it load-generator --image=busybox --rm -- \
  /bin/sh -c "while true; do wget -q -O- http://web-app; done"
 
# In another terminal — watch HPA react
kubectl get hpa web-app-hpa --watch

You should see REPLICAS increase as CPU climbs above 50%.


Scale-Down Behaviour (Stabilization)

HPA doesn't scale down immediately after load drops. By default it waits 5 minutes before scaling down to avoid thrashing (scaling down then back up rapidly).

You can tune this:

yaml
spec:
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300   # Wait 5 min before scaling down
      policies:
      - type: Pods
        value: 1                         # Remove max 1 pod at a time
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0     # Scale up immediately
      policies:
      - type: Pods
        value: 4                         # Add max 4 pods at a time
        periodSeconds: 15

Common Mistakes

1. No resource requests set

bash
# HPA shows <unknown>
kubectl get hpa
# TARGETS: <unknown>/50%

Fix: add resources.requests.cpu to your container spec.

2. Metrics Server not installed

HPA can't get metrics. Install Metrics Server first.

3. Min replicas = 1

If your min is 1 and HPA scales down during low traffic, you have a single point of failure. Set min replicas to at least 2 for production workloads.

4. HPA fighting with a static replicas in Deployment YAML

If your Deployment YAML has replicas: 3 hardcoded and you apply it repeatedly (via CI/CD), it overrides HPA's scaling decisions. Solution:

Either remove replicas from your Deployment spec, or use kubectl apply with server-side apply. Most teams just remove replicas from the Deployment and let HPA manage it.


HPA with Custom Metrics (Advanced)

You can scale on any metric — queue depth, HTTP requests per second, anything exposed via the custom metrics API. This requires an adapter like:

  • KEDA (easiest, supports 60+ event sources)
  • Prometheus Adapter (for Prometheus metrics)

Example with Prometheus Adapter:

yaml
metrics:
- type: Pods
  pods:
    metric:
      name: http_requests_per_second
    target:
      type: AverageValue
      averageValue: "100"

For event-driven scaling (RabbitMQ, SQS, Kafka), use KEDA instead — it's simpler.


Quick Reference

bash
# Create HPA
kubectl autoscale deployment my-app --cpu-percent=50 --min=2 --max=10
 
# View HPA
kubectl get hpa
kubectl describe hpa my-app
 
# Delete HPA
kubectl delete hpa my-app
 
# Check metrics (verify Metrics Server works)
kubectl top pods
kubectl top nodes

Summary

HPA automatically scales your pod count based on CPU or memory usage. To get it working:

  1. Install Metrics Server
  2. Set resources.requests.cpu in your Deployment
  3. Create an HPA with min/max replicas and target utilization
  4. Test with a load generator and watch it scale

That's it. For most web apps, CPU-based HPA with 50–70% target utilization is the right starting point.

Practice HPA on a real multi-node cluster — DigitalOcean Kubernetes gives $200 free credit. Spin up a 3-node cluster and test autoscaling end-to-end. Also check KodeKloud for hands-on K8s autoscaling labs.

Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments