Kubernetes Vertical Pod Autoscaler (VPA) — Complete Guide (2026)
Everything you need to know about Kubernetes VPA. Covers installation, recommendation modes, right-sizing strategies, VPA vs HPA, and production best practices for resource optimization.
Every Kubernetes cluster has the same problem: developers guess resource requests, and they guess wrong. Memory requests are 3x what pods actually use. CPU requests are set to "1 core" because someone copied it from a tutorial. The result? Wasted money and scheduling inefficiency.
The Vertical Pod Autoscaler (VPA) fixes this by automatically right-sizing pod resource requests based on actual usage. It analyzes historical metrics and either recommends or automatically applies optimal CPU and memory values.
If you're managing more than a handful of deployments, VPA should be in your toolkit. Here's the complete guide.
What VPA Does
VPA adjusts the resource requests and limits of containers in a pod based on observed usage:
Without VPA:
requests.cpu: 1000m (actual usage: 150m) → 85% waste
requests.memory: 1Gi (actual usage: 256Mi) → 75% waste
With VPA:
requests.cpu: 250m (target based on P95) → right-sized
requests.memory: 384Mi (target with headroom) → right-sized
VPA has three operating modes:
| Mode | Behavior | Risk Level |
|---|---|---|
| Off | VPA calculates recommendations but does nothing | None |
| Initial | Sets resources only when pods are created (not on running pods) | Low |
| Auto | Evicts and recreates pods with updated resources | Medium |
Most teams start with Off (recommend only), then graduate to Auto for non-critical workloads.
Installing VPA
Prerequisites
- Kubernetes 1.26+
- Metrics Server installed (
kubectl top podsmust work)
Installation
# Clone the VPA repo
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
# Install VPA components
./hack/vpa-up.shOr with Helm:
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install vpa fairwinds-stable/vpa \
--namespace vpa \
--create-namespace \
--set recommender.enabled=true \
--set updater.enabled=true \
--set admissionController.enabled=trueVerify:
kubectl get pods -n vpaNAME READY STATUS RESTARTS AGE
vpa-admission-controller-6b5c4f6d8-x2k4m 1/1 Running 0 2m
vpa-recommender-7d9bc5f8b-9j3kl 1/1 Running 0 2m
vpa-updater-5f9d8b7c6-l8m2p 1/1 Running 0 2m
Three components:
- Recommender — watches pod metrics, calculates optimal resources
- Updater — evicts pods that need resource updates (only in Auto mode)
- Admission Controller — mutates pod specs during creation with VPA recommendations
Creating a VPA Resource
Recommend-Only Mode (Start Here)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-server-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
updatePolicy:
updateMode: "Off" # Recommend only, don't change anything
resourcePolicy:
containerPolicies:
- containerName: api
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 4
memory: 8Gi
controlledResources: ["cpu", "memory"]kubectl apply -f vpa.yaml
# Wait 5-10 minutes for recommendations, then check
kubectl describe vpa api-server-vpa -n productionOutput:
Recommendation:
Container Recommendations:
Container Name: api
Lower Bound:
Cpu: 125m
Memory: 196Mi
Target:
Cpu: 250m
Memory: 384Mi
Uncapped Target:
Cpu: 250m
Memory: 384Mi
Upper Bound:
Cpu: 1200m
Memory: 1536MiKey values:
- Target — the recommended requests (what VPA suggests)
- Lower Bound — minimum safe value (P10 usage)
- Upper Bound — handles traffic spikes (P95+ usage)
- Uncapped Target — target without minAllowed/maxAllowed constraints
Auto Mode (For Production)
Once you trust the recommendations, enable auto mode:
spec:
updatePolicy:
updateMode: "Auto"
minReplicas: 2 # Don't evict if fewer than 2 replicasIn Auto mode, VPA:
- Monitors actual resource usage
- Calculates optimal requests
- Evicts pods whose current requests differ significantly from the target
- The Admission Controller sets the new requests when the pod is recreated
Warning: Auto mode evicts pods! Ensure you have:
- Multiple replicas
- PodDisruptionBudgets
- Proper readiness probes
VPA vs HPA — When to Use Which
| Feature | VPA | HPA |
|---|---|---|
| Scales | Resources per pod (vertical) | Number of pods (horizontal) |
| Best for | Right-sizing requests | Handling traffic changes |
| Metric | Historical CPU/memory usage | Current CPU/memory/custom |
| Disruption | Evicts pods (in Auto) | Adds/removes pods |
Can you use both? Yes, but with caveats:
# VPA controls memory only
spec:
resourcePolicy:
containerPolicies:
- containerName: api
controlledResources: ["memory"] # VPA handles memory
# HPA handles CPU-based scalingThe rule: don't let VPA and HPA compete on the same metric. If HPA scales on CPU, let VPA handle memory (or vice versa).
Production Configuration Patterns
Pattern 1: Conservative Right-Sizing
For critical workloads where stability matters more than cost:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: payment-service-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: payment-service
updatePolicy:
updateMode: "Initial" # Only set on new pods, don't evict
resourcePolicy:
containerPolicies:
- containerName: payment
minAllowed:
cpu: 500m # High minimum for critical service
memory: 512Mi
maxAllowed:
cpu: 4
memory: 8Gi
controlledResources: ["cpu", "memory"]Pattern 2: Aggressive Cost Optimization
For non-critical workloads (dev/staging, batch jobs, internal tools):
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: internal-tool-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: internal-dashboard
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: dashboard
minAllowed:
cpu: 50m
memory: 64Mi
maxAllowed:
cpu: 2
memory: 4GiPattern 3: VPA for Every Deployment (Goldilocks)
Fairwinds Goldilocks creates VPA objects for every deployment in a namespace and provides a dashboard:
helm install goldilocks fairwinds-stable/goldilocks \
--namespace goldilocks \
--create-namespace
# Enable for a namespace
kubectl label namespace production goldilocks.fairwinds.com/enabled=trueAccess the dashboard:
kubectl port-forward -n goldilocks svc/goldilocks-dashboard 8080:80Goldilocks shows recommended requests vs current requests for every deployment, with estimated cost savings.
Monitoring VPA
Prometheus Metrics
VPA exposes metrics that you should monitor:
# Alert: VPA recommendation significantly different from current
- alert: VPARecommendationDrift
expr: |
(
kube_vpa_containerrecommendations_target{resource="cpu"}
/
kube_pod_container_resource_requests{resource="cpu"}
) > 2 or (
kube_vpa_containerrecommendations_target{resource="cpu"}
/
kube_pod_container_resource_requests{resource="cpu"}
) < 0.3
for: 1h
labels:
severity: warning
annotations:
summary: "VPA recommendation for {{ $labels.container }} differs >2x from current requests"Key Metrics to Watch
vpa_recommender_vpa_objects_count— number of VPA objectsvpa_recommender_recommendation_latency— time to calculate recommendationsvpa_updater_evictions_total— pods evicted by VPA (Auto mode)
Common Pitfalls
1. VPA evicts your single-replica deployment
VPA doesn't check if eviction will cause downtime. Always set:
updatePolicy:
minReplicas: 2And have a PodDisruptionBudget:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-pdb
spec:
minAvailable: 1
selector:
matchLabels:
app: api-server2. VPA recommends tiny resources because pods are idle
VPA analyzes the last 8 days of metrics by default. If your workload has weekly peaks (e.g., Monday batch processing), VPA might right-size based on the quiet days.
Fix: ensure at least 2 weeks of metrics before enabling Auto mode.
3. OOMKilled after VPA lowers memory
VPA targets P90 usage plus headroom, but memory spikes can exceed this. Set a reasonable minAllowed for memory:
minAllowed:
memory: 256Mi # Never go below this regardless of recommendationQuick Start Checklist
# 1. Install Metrics Server (if not present)
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# 2. Verify metrics work
kubectl top pods -A
# 3. Install VPA
# (see installation section above)
# 4. Create VPA in "Off" mode for your biggest deployments
# (see recommend-only pattern above)
# 5. Wait 1-2 days, review recommendations
kubectl get vpa -A -o custom-columns="NAME:.metadata.name,TARGET-CPU:.status.recommendation.containerRecommendations[0].target.cpu,TARGET-MEM:.status.recommendation.containerRecommendations[0].target.memory"
# 6. Apply recommendations manually or switch to AutoLearn More
VPA is one of the most impactful cost-saving tools in Kubernetes. For hands-on practice with autoscaling and resource management, KodeKloud's Kubernetes courses cover both HPA and VPA in real lab environments.
The cheapest resource is the one you don't allocate. VPA ensures you allocate exactly what you need — no more, no less.
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
AI-Driven Capacity Planning for Kubernetes Clusters (2026)
How to use AI and machine learning for Kubernetes capacity planning. Covers predictive autoscaling, cost optimization, tools like StormForge and Kubecost, and building custom ML models for resource forecasting.
AI-Powered Kubernetes Anomaly Detection: Beyond Static Thresholds
Static alerts miss 40% of real incidents. Learn how AI and ML-based anomaly detection — using tools like Prometheus + ML, Dynatrace, and custom LLM runbooks — catches what thresholds can't.
Build a Complete Kubernetes Monitoring Stack from Scratch (2026)
Step-by-step project walkthrough: set up Prometheus, Grafana, Loki, and AlertManager on Kubernetes using Helm. Real configs, real dashboards, production-ready.