All Articles

How to Set Up Vertical Pod Autoscaler (VPA) with In-Place Resize in Kubernetes

Step-by-step guide to setting up Kubernetes VPA with In-Place Pod Resize. Auto-scale CPU and memory without pod restarts. Full tutorial with YAML examples.

DevOpsBoysMar 21, 20265 min read
Share:Tweet

Horizontal Pod Autoscaler (HPA) adds more pods when load increases. But what about when a single pod needs more CPU or memory? That's what Vertical Pod Autoscaler (VPA) does — and with Kubernetes 1.35's In-Place Pod Resize, it can now do it without restarting your pods.

This guide walks you through setting up VPA with in-place resize from scratch.

Prerequisites

  • Kubernetes 1.35+ cluster
  • kubectl configured and connected
  • git installed
  • Metrics Server running in the cluster

Check your Kubernetes version:

bash
kubectl version --short

Check if Metrics Server is running:

bash
kubectl get deployment metrics-server -n kube-system

If Metrics Server isn't installed:

bash
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Step 1 — Install VPA

Clone the autoscaler repository and install VPA components:

bash
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler

Run the installation script:

bash
./hack/vpa-up.sh

This installs three components:

  • VPA Recommender — analyzes resource usage and generates recommendations
  • VPA Updater — applies recommendations to pods
  • VPA Admission Controller — sets initial resources for new pods

Verify everything is running:

bash
kubectl get pods -n kube-system | grep vpa

You should see:

vpa-admission-controller-xxx   1/1   Running
vpa-recommender-xxx            1/1   Running
vpa-updater-xxx                1/1   Running

Step 2 — Deploy a Sample Application

Create a simple deployment to test VPA:

yaml
# demo-app.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: demo-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: demo-app
  template:
    metadata:
      labels:
        app: demo-app
    spec:
      containers:
      - name: app
        image: nginx:1.27
        resources:
          requests:
            cpu: 50m
            memory: 64Mi
          limits:
            cpu: 100m
            memory: 128Mi
        resizePolicy:
        - resourceName: cpu
          restartPolicy: NotRequired
        - resourceName: memory
          restartPolicy: NotRequired

The resizePolicy tells Kubernetes that both CPU and memory can be changed in-place without restarting the container.

Apply it:

bash
kubectl apply -f demo-app.yaml

Step 3 — Create a VPA with InPlaceOrRecreate Mode

yaml
# vpa-demo.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: demo-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: demo-app
  updatePolicy:
    updateMode: "InPlaceOrRecreate"
  resourcePolicy:
    containerPolicies:
    - containerName: app
      minAllowed:
        cpu: 25m
        memory: 32Mi
      maxAllowed:
        cpu: 1000m
        memory: 1Gi
      controlledResources: ["cpu", "memory"]

Key configuration:

  • updateMode: InPlaceOrRecreate — tries in-place resize first, falls back to pod restart
  • minAllowed / maxAllowed — sets bounds so VPA doesn't over-provision or under-provision
  • controlledResources — which resources VPA manages

Apply it:

bash
kubectl apply -f vpa-demo.yaml

Step 4 — Understand VPA Update Modes

VPA supports four update modes:

ModeBehavior
OffVPA only recommends, never applies changes
InitialSets resources only when pod is first created
RecreateEvicts pod and recreates with new resources (old behavior)
InPlaceOrRecreateResizes in-place if possible, evicts if not (new, recommended)

Always use InPlaceOrRecreate on Kubernetes 1.35+ — there's no reason to restart pods for resource changes anymore.

Step 5 — Generate Load and Watch VPA React

Generate some CPU load on the demo app:

bash
kubectl run load-generator --image=busybox:1.36 --restart=Never -- /bin/sh -c \
  "while true; do wget -q -O- http://demo-app.default.svc.cluster.local; done"

Watch VPA recommendations update (this takes 5-10 minutes):

bash
kubectl get vpa demo-app-vpa -o yaml | grep -A 20 recommendation

You'll see something like:

yaml
recommendation:
  containerRecommendations:
  - containerName: app
    lowerBound:
      cpu: 25m
      memory: 64Mi
    target:
      cpu: 150m
      memory: 128Mi
    upperBound:
      cpu: 300m
      memory: 256Mi

Check if pods have been resized in-place:

bash
kubectl get pod -l app=demo-app -o jsonpath='{range .items[*]}{.metadata.name}: CPU={.spec.containers[0].resources.requests.cpu}, Memory={.spec.containers[0].resources.requests.memory}{"\n"}{end}'

Check resize status:

bash
kubectl get pod -l app=demo-app -o jsonpath='{range .items[*]}{.metadata.name}: resize={.status.resize}{"\n"}{end}'

If resize is empty, the resize completed successfully. If it shows Deferred, the node doesn't have enough resources right now.

Step 6 — Set Up VPA for a Real Application

Here's a production-ready VPA configuration for a web application:

yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-app-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: "InPlaceOrRecreate"
    minReplicas: 2  # never resize if it would leave fewer than 2 pods
  resourcePolicy:
    containerPolicies:
    - containerName: web
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2000m
        memory: 4Gi
      controlledResources: ["cpu", "memory"]
      controlledValues: RequestsAndLimits

controlledValues: RequestsAndLimits means VPA adjusts both requests and limits, maintaining the same ratio. If your requests are 100m CPU with limits at 200m (2:1 ratio), VPA will maintain that ratio when scaling.

Step 7 — Combine VPA with HPA

You can use VPA and HPA together, but with a rule: don't let both control the same resource.

Common pattern: VPA manages memory, HPA manages CPU-based scaling:

yaml
# VPA - controls memory only
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "InPlaceOrRecreate"
  resourcePolicy:
    containerPolicies:
    - containerName: app
      controlledResources: ["memory"]
      minAllowed:
        memory: 128Mi
      maxAllowed:
        memory: 4Gi
---
# HPA - controls replicas based on CPU
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Troubleshooting

VPA Not Recommending

VPA needs at least 24 hours of metrics to generate good recommendations. For testing, you can check short-term recommendations:

bash
kubectl get vpa demo-app-vpa -o jsonpath='{.status.recommendation}'

If empty after 5 minutes, check VPA recommender logs:

bash
kubectl logs -n kube-system deployment/vpa-recommender

Resize Stuck in "Deferred"

The node doesn't have enough allocatable resources:

bash
# Check node allocatable vs allocated
kubectl describe node <node-name> | grep -A 5 "Allocated resources"

Solutions:

  • Scale up the node pool
  • Move less critical pods to other nodes
  • Reduce maxAllowed in VPA policy

Pods Still Restarting

Check if resizePolicy is set on the pod spec:

bash
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].resizePolicy}'

If missing, the default is RestartContainer. Add resizePolicy with NotRequired to enable in-place resize.

Wrapping Up

VPA with In-Place Resize is the vertical scaling solution Kubernetes has been missing. Your pods get the resources they need, automatically, without restarts. Combined with HPA for horizontal scaling, you get a complete autoscaling strategy.

Start with updateMode: Off to see what VPA recommends for your workloads, then switch to InPlaceOrRecreate once you're comfortable with the recommendations.

Want to master Kubernetes autoscaling — VPA, HPA, KEDA, and cluster autoscaling — with hands-on practice? The KodeKloud Kubernetes course covers all scaling strategies with real cluster labs. For a quick test cluster, DigitalOcean Kubernetes runs Kubernetes 1.35 and makes it easy to test VPA in-place resize.

Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments