Kubernetes Vertical Pod Autoscaler (VPA) — Complete Guide (2026)

Everything you need to know about Kubernetes VPA. Covers installation, recommendation modes, right-sizing strategies, VPA vs HPA, and production best practices for resource optimization.

Every Kubernetes cluster has the same problem: developers guess resource requests, and they guess wrong. Memory requests are 3x what pods actually use. CPU requests are set to "1 core" because someone copied it from a tutorial. The result? Wasted money and scheduling inefficiency.

The Vertical Pod Autoscaler (VPA) fixes this by automatically right-sizing pod resource requests based on actual usage. It analyzes historical metrics and either recommends or automatically applies optimal CPU and memory values.

If you're managing more than a handful of deployments, VPA should be in your toolkit. Here's the complete guide.

What VPA Does

VPA adjusts the resource requests and limits of containers in a pod based on observed usage:

Without VPA:
  requests.cpu: 1000m    (actual usage: 150m)    → 85% waste
  requests.memory: 1Gi   (actual usage: 256Mi)   → 75% waste

With VPA:
  requests.cpu: 250m     (target based on P95)    → right-sized
  requests.memory: 384Mi (target with headroom)   → right-sized

VPA has three operating modes:

Mode	Behavior	Risk Level
Off	VPA calculates recommendations but does nothing	None
Initial	Sets resources only when pods are created (not on running pods)	Low
Auto	Evicts and recreates pods with updated resources	Medium

Most teams start with Off (recommend only), then graduate to Auto for non-critical workloads.

Installing VPA

Prerequisites

Kubernetes 1.26+
Metrics Server installed (kubectl top pods must work)

Installation

bash

# Clone the VPA repo
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
 
# Install VPA components
./hack/vpa-up.sh

Or with Helm:

bash

helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install vpa fairwinds-stable/vpa \
  --namespace vpa \
  --create-namespace \
  --set recommender.enabled=true \
  --set updater.enabled=true \
  --set admissionController.enabled=true

Verify:

bash

kubectl get pods -n vpa

NAME                                        READY   STATUS    RESTARTS   AGE
vpa-admission-controller-6b5c4f6d8-x2k4m   1/1     Running   0          2m
vpa-recommender-7d9bc5f8b-9j3kl             1/1     Running   0          2m
vpa-updater-5f9d8b7c6-l8m2p                 1/1     Running   0          2m

Three components:

Recommender — watches pod metrics, calculates optimal resources
Updater — evicts pods that need resource updates (only in Auto mode)
Admission Controller — mutates pod specs during creation with VPA recommendations

Creating a VPA Resource

yaml

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-server-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: "Off"  # Recommend only, don't change anything
  resourcePolicy:
    containerPolicies:
      - containerName: api
        minAllowed:
          cpu: 100m
          memory: 128Mi
        maxAllowed:
          cpu: 4
          memory: 8Gi
        controlledResources: ["cpu", "memory"]

bash

kubectl apply -f vpa.yaml
 
# Wait 5-10 minutes for recommendations, then check
kubectl describe vpa api-server-vpa -n production

Output:

yaml

Recommendation:
  Container Recommendations:
    Container Name: api
    Lower Bound:
      Cpu:     125m
      Memory:  196Mi
    Target:
      Cpu:     250m
      Memory:  384Mi
    Uncapped Target:
      Cpu:     250m
      Memory:  384Mi
    Upper Bound:
      Cpu:     1200m
      Memory:  1536Mi

Key values:

Target — the recommended requests (what VPA suggests)
Lower Bound — minimum safe value (P10 usage)
Upper Bound — handles traffic spikes (P95+ usage)
Uncapped Target — target without minAllowed/maxAllowed constraints

Auto Mode (For Production)

Once you trust the recommendations, enable auto mode:

yaml

spec:
  updatePolicy:
    updateMode: "Auto"
    minReplicas: 2  # Don't evict if fewer than 2 replicas

In Auto mode, VPA:

Monitors actual resource usage
Calculates optimal requests
Evicts pods whose current requests differ significantly from the target
The Admission Controller sets the new requests when the pod is recreated

Warning: Auto mode evicts pods! Ensure you have:

Multiple replicas
PodDisruptionBudgets
Proper readiness probes

VPA vs HPA — When to Use Which

Feature	VPA	HPA
Scales	Resources per pod (vertical)	Number of pods (horizontal)
Best for	Right-sizing requests	Handling traffic changes
Metric	Historical CPU/memory usage	Current CPU/memory/custom
Disruption	Evicts pods (in Auto)	Adds/removes pods

Can you use both? Yes, but with caveats:

yaml

# VPA controls memory only
spec:
  resourcePolicy:
    containerPolicies:
      - containerName: api
        controlledResources: ["memory"]  # VPA handles memory
        # HPA handles CPU-based scaling

The rule: don't let VPA and HPA compete on the same metric. If HPA scales on CPU, let VPA handle memory (or vice versa).

Production Configuration Patterns

Pattern 1: Conservative Right-Sizing

For critical workloads where stability matters more than cost:

yaml

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: payment-service-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: payment-service
  updatePolicy:
    updateMode: "Initial"  # Only set on new pods, don't evict
  resourcePolicy:
    containerPolicies:
      - containerName: payment
        minAllowed:
          cpu: 500m      # High minimum for critical service
          memory: 512Mi
        maxAllowed:
          cpu: 4
          memory: 8Gi
        controlledResources: ["cpu", "memory"]

Pattern 2: Aggressive Cost Optimization

For non-critical workloads (dev/staging, batch jobs, internal tools):

yaml

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: internal-tool-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: internal-dashboard
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
      - containerName: dashboard
        minAllowed:
          cpu: 50m
          memory: 64Mi
        maxAllowed:
          cpu: 2
          memory: 4Gi

Pattern 3: VPA for Every Deployment (Goldilocks)

Fairwinds Goldilocks creates VPA objects for every deployment in a namespace and provides a dashboard:

bash

helm install goldilocks fairwinds-stable/goldilocks \
  --namespace goldilocks \
  --create-namespace
 
# Enable for a namespace
kubectl label namespace production goldilocks.fairwinds.com/enabled=true

Access the dashboard:

bash

kubectl port-forward -n goldilocks svc/goldilocks-dashboard 8080:80

Goldilocks shows recommended requests vs current requests for every deployment, with estimated cost savings.

Monitoring VPA

Prometheus Metrics

VPA exposes metrics that you should monitor:

yaml

# Alert: VPA recommendation significantly different from current
- alert: VPARecommendationDrift
  expr: |
    (
      kube_vpa_containerrecommendations_target{resource="cpu"}
      /
      kube_pod_container_resource_requests{resource="cpu"}
    ) > 2 or (
      kube_vpa_containerrecommendations_target{resource="cpu"}
      /
      kube_pod_container_resource_requests{resource="cpu"}
    ) < 0.3
  for: 1h
  labels:
    severity: warning
  annotations:
    summary: "VPA recommendation for {{ $labels.container }} differs >2x from current requests"

Key Metrics to Watch

vpa_recommender_vpa_objects_count — number of VPA objects
vpa_recommender_recommendation_latency — time to calculate recommendations
vpa_updater_evictions_total — pods evicted by VPA (Auto mode)

Common Pitfalls

1. VPA evicts your single-replica deployment

VPA doesn't check if eviction will cause downtime. Always set:

yaml

updatePolicy:
  minReplicas: 2

And have a PodDisruptionBudget:

yaml

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: api-server

2. VPA recommends tiny resources because pods are idle

VPA analyzes the last 8 days of metrics by default. If your workload has weekly peaks (e.g., Monday batch processing), VPA might right-size based on the quiet days.

Fix: ensure at least 2 weeks of metrics before enabling Auto mode.

3. OOMKilled after VPA lowers memory

VPA targets P90 usage plus headroom, but memory spikes can exceed this. Set a reasonable minAllowed for memory:

yaml

minAllowed:
  memory: 256Mi  # Never go below this regardless of recommendation

Quick Start Checklist

bash

# 1. Install Metrics Server (if not present)
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
 
# 2. Verify metrics work
kubectl top pods -A
 
# 3. Install VPA
# (see installation section above)
 
# 4. Create VPA in "Off" mode for your biggest deployments
# (see recommend-only pattern above)
 
# 5. Wait 1-2 days, review recommendations
kubectl get vpa -A -o custom-columns="NAME:.metadata.name,TARGET-CPU:.status.recommendation.containerRecommendations[0].target.cpu,TARGET-MEM:.status.recommendation.containerRecommendations[0].target.memory"
 
# 6. Apply recommendations manually or switch to Auto

Learn More

VPA is one of the most impactful cost-saving tools in Kubernetes. For hands-on practice with autoscaling and resource management, KodeKloud's Kubernetes courses cover both HPA and VPA in real lab environments.

The cheapest resource is the one you don't allocate. VPA ensures you allocate exactly what you need — no more, no less.

Kubernetes Vertical Pod Autoscaler (VPA) — Complete Guide (2026)

What VPA Does

Installing VPA

Prerequisites

Installation

Creating a VPA Resource

Auto Mode (For Production)

VPA vs HPA — When to Use Which

Production Configuration Patterns

Pattern 1: Conservative Right-Sizing

Pattern 2: Aggressive Cost Optimization

Pattern 3: VPA for Every Deployment (Goldilocks)

Monitoring VPA

Prometheus Metrics

Key Metrics to Watch

Common Pitfalls

Quick Start Checklist

Learn More

Stay ahead of the curve

Related Articles

AI-Driven Capacity Planning for Kubernetes Clusters (2026)

AI-Powered Kubernetes Anomaly Detection: Beyond Static Thresholds

Build an AI Capacity Forecasting Tool with Prophet + Kubernetes Metrics

Comments