What is Kubernetes HPA? Horizontal Pod Autoscaler Explained Simply
HPA in Kubernetes explained from scratch — what it does, how it works, how to set it up, and common mistakes to avoid. No jargon.
Your app is running fine when 10 users hit it. But when 1,000 users arrive during a launch or sale, it crashes. Adding pods manually isn't realistic.
That's what HPA (Horizontal Pod Autoscaler) solves. It automatically scales the number of pods up or down based on load.
What Does HPA Actually Do?
HPA watches your pods and asks: "Is this workload under too much pressure?" If yes, it adds more pods. When load drops, it removes them.
Low traffic → HPA sees CPU < 30% → scales DOWN to 2 pods
Normal traffic → HPA sees CPU = 50% → keeps 3 pods
High traffic → HPA sees CPU = 90% → scales UP to 8 pods
It does this automatically, every 15 seconds by default.
HPA vs VPA vs KEDA
Before going further — there are three autoscalers in Kubernetes:
| What it scales | Based on | |
|---|---|---|
| HPA | Number of pod replicas | CPU, memory, custom metrics |
| VPA | CPU/memory requests per pod | Historical usage |
| KEDA | Number of replicas | External events (queue depth, Kafka lag, cron) |
HPA = scale out (more pods). VPA = scale up (bigger pods). KEDA = scale from zero based on events.
This post covers HPA.
How HPA Works Internally
HPA talks to the Metrics Server to get CPU/memory usage. Every 15 seconds it calculates:
desiredReplicas = ceil( currentReplicas × (currentMetricValue / targetMetricValue) )
Example:
- 3 pods running
- Current CPU: 90%
- Target CPU: 50%
- Desired = ceil(3 × 90/50) = ceil(5.4) = 6 pods
HPA then tells the Deployment to scale to 6 replicas.
Prerequisites: Metrics Server
HPA needs the Metrics Server installed. Check if it's running:
kubectl get deployment metrics-server -n kube-systemIf it's not there:
# Install Metrics Server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yamlFor local clusters (minikube):
minikube addons enable metrics-serverVerify it works:
kubectl top pods
kubectl top nodesIf kubectl top pods shows data, you're ready.
Creating Your First HPA
Step 1 — Create a Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 2
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
spec:
containers:
- name: web
image: nginx
resources:
requests:
cpu: "100m" # HPA needs requests set
memory: "128Mi"
limits:
cpu: "500m"
memory: "256Mi"Important: HPA won't work without
resources.requests.cpuset. It uses requests as the baseline for percentage calculations.
Step 2 — Create the HPA
Imperative (fast):
kubectl autoscale deployment web-app \
--cpu-percent=50 \
--min=2 \
--max=10Declarative YAML (recommended):
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50 # Target 50% CPU usageApply it:
kubectl apply -f hpa.yamlStep 3 — Check HPA Status
kubectl get hpa
# NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS
# web-app-hpa Deployment/web-app 22%/50% 2 10 2
kubectl describe hpa web-app-hpaThe TARGETS column shows currentValue/targetValue. If it shows <unknown>/50%, Metrics Server isn't working.
Scale Based on Memory Too
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 70HPA scales up if either CPU OR memory hits the target.
Testing HPA
Generate load to see HPA scale up:
# In one terminal — run a load generator
kubectl run -it load-generator --image=busybox --rm -- \
/bin/sh -c "while true; do wget -q -O- http://web-app; done"
# In another terminal — watch HPA react
kubectl get hpa web-app-hpa --watchYou should see REPLICAS increase as CPU climbs above 50%.
Scale-Down Behaviour (Stabilization)
HPA doesn't scale down immediately after load drops. By default it waits 5 minutes before scaling down to avoid thrashing (scaling down then back up rapidly).
You can tune this:
spec:
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 min before scaling down
policies:
- type: Pods
value: 1 # Remove max 1 pod at a time
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0 # Scale up immediately
policies:
- type: Pods
value: 4 # Add max 4 pods at a time
periodSeconds: 15Common Mistakes
1. No resource requests set
# HPA shows <unknown>
kubectl get hpa
# TARGETS: <unknown>/50%Fix: add resources.requests.cpu to your container spec.
2. Metrics Server not installed
HPA can't get metrics. Install Metrics Server first.
3. Min replicas = 1
If your min is 1 and HPA scales down during low traffic, you have a single point of failure. Set min replicas to at least 2 for production workloads.
4. HPA fighting with a static replicas in Deployment YAML
If your Deployment YAML has replicas: 3 hardcoded and you apply it repeatedly (via CI/CD), it overrides HPA's scaling decisions. Solution:
Either remove replicas from your Deployment spec, or use kubectl apply with server-side apply. Most teams just remove replicas from the Deployment and let HPA manage it.
HPA with Custom Metrics (Advanced)
You can scale on any metric — queue depth, HTTP requests per second, anything exposed via the custom metrics API. This requires an adapter like:
- KEDA (easiest, supports 60+ event sources)
- Prometheus Adapter (for Prometheus metrics)
Example with Prometheus Adapter:
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"For event-driven scaling (RabbitMQ, SQS, Kafka), use KEDA instead — it's simpler.
Quick Reference
# Create HPA
kubectl autoscale deployment my-app --cpu-percent=50 --min=2 --max=10
# View HPA
kubectl get hpa
kubectl describe hpa my-app
# Delete HPA
kubectl delete hpa my-app
# Check metrics (verify Metrics Server works)
kubectl top pods
kubectl top nodesSummary
HPA automatically scales your pod count based on CPU or memory usage. To get it working:
- Install Metrics Server
- Set
resources.requests.cpuin your Deployment - Create an HPA with min/max replicas and target utilization
- Test with a load generator and watch it scale
That's it. For most web apps, CPU-based HPA with 50–70% target utilization is the right starting point.
Practice HPA on a real multi-node cluster — DigitalOcean Kubernetes gives $200 free credit. Spin up a 3-node cluster and test autoscaling end-to-end. Also check KodeKloud for hands-on K8s autoscaling labs.
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
Build a Kubernetes Cluster with kubeadm from Scratch (2026)
Step-by-step guide to building a real multi-node Kubernetes cluster using kubeadm — no managed services, no shortcuts.
How to Build a DevOps Home Lab for Free in 2026
You don't need expensive hardware to practice DevOps. Here's how to build a complete home lab with Kubernetes, CI/CD, and monitoring using free tools and cloud free tiers.
How to Crack the CKA Exam in 2026: Study Plan, Resources, and Tips
Complete CKA exam prep guide for 2026 — what to study, how to practice, which resources actually help, and tips to pass on the first attempt.