What is Kubernetes HPA? Horizontal Pod Autoscaler Explained Simply

HPA in Kubernetes explained from scratch — what it does, how it works, how to set it up, and common mistakes to avoid. No jargon.

Your app is running fine when 10 users hit it. But when 1,000 users arrive during a launch or sale, it crashes. Adding pods manually isn't realistic.

That's what HPA (Horizontal Pod Autoscaler) solves. It automatically scales the number of pods up or down based on load.

What Does HPA Actually Do?

HPA watches your pods and asks: "Is this workload under too much pressure?" If yes, it adds more pods. When load drops, it removes them.

Low traffic    → HPA sees CPU < 30%  → scales DOWN to 2 pods
Normal traffic → HPA sees CPU = 50%  → keeps 3 pods
High traffic   → HPA sees CPU = 90%  → scales UP to 8 pods

It does this automatically, every 15 seconds by default.

HPA vs VPA vs KEDA

Before going further — there are three autoscalers in Kubernetes:

	What it scales	Based on
HPA	Number of pod replicas	CPU, memory, custom metrics
VPA	CPU/memory requests per pod	Historical usage
KEDA	Number of replicas	External events (queue depth, Kafka lag, cron)

HPA = scale out (more pods). VPA = scale up (bigger pods). KEDA = scale from zero based on events.

This post covers HPA.

How HPA Works Internally

HPA talks to the Metrics Server to get CPU/memory usage. Every 15 seconds it calculates:

desiredReplicas = ceil( currentReplicas × (currentMetricValue / targetMetricValue) )

Example:

3 pods running
Current CPU: 90%
Target CPU: 50%
Desired = ceil(3 × 90/50) = ceil(5.4) = 6 pods

HPA then tells the Deployment to scale to 6 replicas.

Prerequisites: Metrics Server

HPA needs the Metrics Server installed. Check if it's running:

bash

kubectl get deployment metrics-server -n kube-system

If it's not there:

bash

# Install Metrics Server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

For local clusters (minikube):

bash

minikube addons enable metrics-server

Verify it works:

bash

kubectl top pods
kubectl top nodes

If kubectl top pods shows data, you're ready.

Creating Your First HPA

Step 1 — Create a Deployment

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web
        image: nginx
        resources:
          requests:
            cpu: "100m"       # HPA needs requests set
            memory: "128Mi"
          limits:
            cpu: "500m"
            memory: "256Mi"

Important: HPA won't work without resources.requests.cpu set. It uses requests as the baseline for percentage calculations.

Step 2 — Create the HPA

Imperative (fast):

bash

kubectl autoscale deployment web-app \
  --cpu-percent=50 \
  --min=2 \
  --max=10

Declarative YAML (recommended):

yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50    # Target 50% CPU usage

Apply it:

bash

kubectl apply -f hpa.yaml

Step 3 — Check HPA Status

bash

kubectl get hpa
# NAME          REFERENCE            TARGETS   MINPODS   MAXPODS   REPLICAS
# web-app-hpa   Deployment/web-app   22%/50%   2         10        2
 
kubectl describe hpa web-app-hpa

The TARGETS column shows currentValue/targetValue. If it shows <unknown>/50%, Metrics Server isn't working.

Scale Based on Memory Too

yaml

metrics:
- type: Resource
  resource:
    name: cpu
    target:
      type: Utilization
      averageUtilization: 60
- type: Resource
  resource:
    name: memory
    target:
      type: Utilization
      averageUtilization: 70

HPA scales up if either CPU OR memory hits the target.

Testing HPA

Generate load to see HPA scale up:

bash

# In one terminal — run a load generator
kubectl run -it load-generator --image=busybox --rm -- \
  /bin/sh -c "while true; do wget -q -O- http://web-app; done"
 
# In another terminal — watch HPA react
kubectl get hpa web-app-hpa --watch

You should see REPLICAS increase as CPU climbs above 50%.

Scale-Down Behaviour (Stabilization)

HPA doesn't scale down immediately after load drops. By default it waits 5 minutes before scaling down to avoid thrashing (scaling down then back up rapidly).

You can tune this:

yaml

spec:
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300   # Wait 5 min before scaling down
      policies:
      - type: Pods
        value: 1                         # Remove max 1 pod at a time
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0     # Scale up immediately
      policies:
      - type: Pods
        value: 4                         # Add max 4 pods at a time
        periodSeconds: 15

Common Mistakes

1. No resource requests set

bash

# HPA shows <unknown>
kubectl get hpa
# TARGETS: <unknown>/50%

Fix: add resources.requests.cpu to your container spec.

KEDA (easiest, supports 60+ event sources)
Prometheus Adapter (for Prometheus metrics)

Example with Prometheus Adapter:

yaml

metrics:
- type: Pods
  pods:
    metric:
      name: http_requests_per_second
    target:
      type: AverageValue
      averageValue: "100"

For event-driven scaling (RabbitMQ, SQS, Kafka), use KEDA instead — it's simpler.

Quick Reference

bash

# Create HPA
kubectl autoscale deployment my-app --cpu-percent=50 --min=2 --max=10
 
# View HPA
kubectl get hpa
kubectl describe hpa my-app
 
# Delete HPA
kubectl delete hpa my-app
 
# Check metrics (verify Metrics Server works)
kubectl top pods
kubectl top nodes

Summary

HPA automatically scales your pod count based on CPU or memory usage. To get it working:

Install Metrics Server
Set resources.requests.cpu in your Deployment
Create an HPA with min/max replicas and target utilization
Test with a load generator and watch it scale

That's it. For most web apps, CPU-based HPA with 50–70% target utilization is the right starting point.

Practice HPA on a real multi-node cluster — DigitalOcean Kubernetes gives $200 free credit. Spin up a 3-node cluster and test autoscaling end-to-end. Also check KodeKloud for hands-on K8s autoscaling labs.

What is Kubernetes HPA? Horizontal Pod Autoscaler Explained Simply

What Does HPA Actually Do?

HPA vs VPA vs KEDA

How HPA Works Internally

Prerequisites: Metrics Server

Creating Your First HPA

Step 1 — Create a Deployment

Step 2 — Create the HPA

Step 3 — Check HPA Status

Scale Based on Memory Too

Testing HPA

Scale-Down Behaviour (Stabilization)

Common Mistakes

1. No resource requests set

2. Metrics Server not installed

3. Min replicas = 1

4. HPA fighting with a static `replicas` in Deployment YAML

HPA with Custom Metrics (Advanced)

Quick Reference

Summary

Stay ahead of the curve

Related Articles

Build a Kubernetes Cluster with kubeadm from Scratch (2026)

How to Build a DevOps Home Lab for Free in 2026

How to Crack the CKA Exam in 2026: Study Plan, Resources, and Tips

Comments