All Articles

Kubernetes Vertical Pod Autoscaler (VPA) — Complete Guide (2026)

Everything you need to know about Kubernetes VPA. Covers installation, recommendation modes, right-sizing strategies, VPA vs HPA, and production best practices for resource optimization.

DevOpsBoysMar 29, 20265 min read
Share:Tweet

Every Kubernetes cluster has the same problem: developers guess resource requests, and they guess wrong. Memory requests are 3x what pods actually use. CPU requests are set to "1 core" because someone copied it from a tutorial. The result? Wasted money and scheduling inefficiency.

The Vertical Pod Autoscaler (VPA) fixes this by automatically right-sizing pod resource requests based on actual usage. It analyzes historical metrics and either recommends or automatically applies optimal CPU and memory values.

If you're managing more than a handful of deployments, VPA should be in your toolkit. Here's the complete guide.

What VPA Does

VPA adjusts the resource requests and limits of containers in a pod based on observed usage:

Without VPA:
  requests.cpu: 1000m    (actual usage: 150m)    → 85% waste
  requests.memory: 1Gi   (actual usage: 256Mi)   → 75% waste

With VPA:
  requests.cpu: 250m     (target based on P95)    → right-sized
  requests.memory: 384Mi (target with headroom)   → right-sized

VPA has three operating modes:

ModeBehaviorRisk Level
OffVPA calculates recommendations but does nothingNone
InitialSets resources only when pods are created (not on running pods)Low
AutoEvicts and recreates pods with updated resourcesMedium

Most teams start with Off (recommend only), then graduate to Auto for non-critical workloads.

Installing VPA

Prerequisites

  • Kubernetes 1.26+
  • Metrics Server installed (kubectl top pods must work)

Installation

bash
# Clone the VPA repo
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
 
# Install VPA components
./hack/vpa-up.sh

Or with Helm:

bash
helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm install vpa fairwinds-stable/vpa \
  --namespace vpa \
  --create-namespace \
  --set recommender.enabled=true \
  --set updater.enabled=true \
  --set admissionController.enabled=true

Verify:

bash
kubectl get pods -n vpa
NAME                                        READY   STATUS    RESTARTS   AGE
vpa-admission-controller-6b5c4f6d8-x2k4m   1/1     Running   0          2m
vpa-recommender-7d9bc5f8b-9j3kl             1/1     Running   0          2m
vpa-updater-5f9d8b7c6-l8m2p                 1/1     Running   0          2m

Three components:

  • Recommender — watches pod metrics, calculates optimal resources
  • Updater — evicts pods that need resource updates (only in Auto mode)
  • Admission Controller — mutates pod specs during creation with VPA recommendations

Creating a VPA Resource

Recommend-Only Mode (Start Here)

yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-server-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: "Off"  # Recommend only, don't change anything
  resourcePolicy:
    containerPolicies:
      - containerName: api
        minAllowed:
          cpu: 100m
          memory: 128Mi
        maxAllowed:
          cpu: 4
          memory: 8Gi
        controlledResources: ["cpu", "memory"]
bash
kubectl apply -f vpa.yaml
 
# Wait 5-10 minutes for recommendations, then check
kubectl describe vpa api-server-vpa -n production

Output:

yaml
Recommendation:
  Container Recommendations:
    Container Name: api
    Lower Bound:
      Cpu:     125m
      Memory:  196Mi
    Target:
      Cpu:     250m
      Memory:  384Mi
    Uncapped Target:
      Cpu:     250m
      Memory:  384Mi
    Upper Bound:
      Cpu:     1200m
      Memory:  1536Mi

Key values:

  • Target — the recommended requests (what VPA suggests)
  • Lower Bound — minimum safe value (P10 usage)
  • Upper Bound — handles traffic spikes (P95+ usage)
  • Uncapped Target — target without minAllowed/maxAllowed constraints

Auto Mode (For Production)

Once you trust the recommendations, enable auto mode:

yaml
spec:
  updatePolicy:
    updateMode: "Auto"
    minReplicas: 2  # Don't evict if fewer than 2 replicas

In Auto mode, VPA:

  1. Monitors actual resource usage
  2. Calculates optimal requests
  3. Evicts pods whose current requests differ significantly from the target
  4. The Admission Controller sets the new requests when the pod is recreated

Warning: Auto mode evicts pods! Ensure you have:

  • Multiple replicas
  • PodDisruptionBudgets
  • Proper readiness probes

VPA vs HPA — When to Use Which

FeatureVPAHPA
ScalesResources per pod (vertical)Number of pods (horizontal)
Best forRight-sizing requestsHandling traffic changes
MetricHistorical CPU/memory usageCurrent CPU/memory/custom
DisruptionEvicts pods (in Auto)Adds/removes pods

Can you use both? Yes, but with caveats:

yaml
# VPA controls memory only
spec:
  resourcePolicy:
    containerPolicies:
      - containerName: api
        controlledResources: ["memory"]  # VPA handles memory
        # HPA handles CPU-based scaling

The rule: don't let VPA and HPA compete on the same metric. If HPA scales on CPU, let VPA handle memory (or vice versa).

Production Configuration Patterns

Pattern 1: Conservative Right-Sizing

For critical workloads where stability matters more than cost:

yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: payment-service-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: payment-service
  updatePolicy:
    updateMode: "Initial"  # Only set on new pods, don't evict
  resourcePolicy:
    containerPolicies:
      - containerName: payment
        minAllowed:
          cpu: 500m      # High minimum for critical service
          memory: 512Mi
        maxAllowed:
          cpu: 4
          memory: 8Gi
        controlledResources: ["cpu", "memory"]

Pattern 2: Aggressive Cost Optimization

For non-critical workloads (dev/staging, batch jobs, internal tools):

yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: internal-tool-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: internal-dashboard
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
      - containerName: dashboard
        minAllowed:
          cpu: 50m
          memory: 64Mi
        maxAllowed:
          cpu: 2
          memory: 4Gi

Pattern 3: VPA for Every Deployment (Goldilocks)

Fairwinds Goldilocks creates VPA objects for every deployment in a namespace and provides a dashboard:

bash
helm install goldilocks fairwinds-stable/goldilocks \
  --namespace goldilocks \
  --create-namespace
 
# Enable for a namespace
kubectl label namespace production goldilocks.fairwinds.com/enabled=true

Access the dashboard:

bash
kubectl port-forward -n goldilocks svc/goldilocks-dashboard 8080:80

Goldilocks shows recommended requests vs current requests for every deployment, with estimated cost savings.

Monitoring VPA

Prometheus Metrics

VPA exposes metrics that you should monitor:

yaml
# Alert: VPA recommendation significantly different from current
- alert: VPARecommendationDrift
  expr: |
    (
      kube_vpa_containerrecommendations_target{resource="cpu"}
      /
      kube_pod_container_resource_requests{resource="cpu"}
    ) > 2 or (
      kube_vpa_containerrecommendations_target{resource="cpu"}
      /
      kube_pod_container_resource_requests{resource="cpu"}
    ) < 0.3
  for: 1h
  labels:
    severity: warning
  annotations:
    summary: "VPA recommendation for {{ $labels.container }} differs >2x from current requests"

Key Metrics to Watch

  • vpa_recommender_vpa_objects_count — number of VPA objects
  • vpa_recommender_recommendation_latency — time to calculate recommendations
  • vpa_updater_evictions_total — pods evicted by VPA (Auto mode)

Common Pitfalls

1. VPA evicts your single-replica deployment

VPA doesn't check if eviction will cause downtime. Always set:

yaml
updatePolicy:
  minReplicas: 2

And have a PodDisruptionBudget:

yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: api-server

2. VPA recommends tiny resources because pods are idle

VPA analyzes the last 8 days of metrics by default. If your workload has weekly peaks (e.g., Monday batch processing), VPA might right-size based on the quiet days.

Fix: ensure at least 2 weeks of metrics before enabling Auto mode.

3. OOMKilled after VPA lowers memory

VPA targets P90 usage plus headroom, but memory spikes can exceed this. Set a reasonable minAllowed for memory:

yaml
minAllowed:
  memory: 256Mi  # Never go below this regardless of recommendation

Quick Start Checklist

bash
# 1. Install Metrics Server (if not present)
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
 
# 2. Verify metrics work
kubectl top pods -A
 
# 3. Install VPA
# (see installation section above)
 
# 4. Create VPA in "Off" mode for your biggest deployments
# (see recommend-only pattern above)
 
# 5. Wait 1-2 days, review recommendations
kubectl get vpa -A -o custom-columns="NAME:.metadata.name,TARGET-CPU:.status.recommendation.containerRecommendations[0].target.cpu,TARGET-MEM:.status.recommendation.containerRecommendations[0].target.memory"
 
# 6. Apply recommendations manually or switch to Auto

Learn More

VPA is one of the most impactful cost-saving tools in Kubernetes. For hands-on practice with autoscaling and resource management, KodeKloud's Kubernetes courses cover both HPA and VPA in real lab environments.


The cheapest resource is the one you don't allocate. VPA ensures you allocate exactly what you need — no more, no less.

Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments