How to Set Up Vertical Pod Autoscaler (VPA) with In-Place Resize in Kubernetes
Step-by-step guide to setting up Kubernetes VPA with In-Place Pod Resize. Auto-scale CPU and memory without pod restarts. Full tutorial with YAML examples.
Horizontal Pod Autoscaler (HPA) adds more pods when load increases. But what about when a single pod needs more CPU or memory? That's what Vertical Pod Autoscaler (VPA) does — and with Kubernetes 1.35's In-Place Pod Resize, it can now do it without restarting your pods.
This guide walks you through setting up VPA with in-place resize from scratch.
Prerequisites
- Kubernetes 1.35+ cluster
kubectlconfigured and connectedgitinstalled- Metrics Server running in the cluster
Check your Kubernetes version:
kubectl version --shortCheck if Metrics Server is running:
kubectl get deployment metrics-server -n kube-systemIf Metrics Server isn't installed:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yamlStep 1 — Install VPA
Clone the autoscaler repository and install VPA components:
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscalerRun the installation script:
./hack/vpa-up.shThis installs three components:
- VPA Recommender — analyzes resource usage and generates recommendations
- VPA Updater — applies recommendations to pods
- VPA Admission Controller — sets initial resources for new pods
Verify everything is running:
kubectl get pods -n kube-system | grep vpaYou should see:
vpa-admission-controller-xxx 1/1 Running
vpa-recommender-xxx 1/1 Running
vpa-updater-xxx 1/1 Running
Step 2 — Deploy a Sample Application
Create a simple deployment to test VPA:
# demo-app.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: demo-app
spec:
replicas: 2
selector:
matchLabels:
app: demo-app
template:
metadata:
labels:
app: demo-app
spec:
containers:
- name: app
image: nginx:1.27
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 100m
memory: 128Mi
resizePolicy:
- resourceName: cpu
restartPolicy: NotRequired
- resourceName: memory
restartPolicy: NotRequiredThe resizePolicy tells Kubernetes that both CPU and memory can be changed in-place without restarting the container.
Apply it:
kubectl apply -f demo-app.yamlStep 3 — Create a VPA with InPlaceOrRecreate Mode
# vpa-demo.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: demo-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: demo-app
updatePolicy:
updateMode: "InPlaceOrRecreate"
resourcePolicy:
containerPolicies:
- containerName: app
minAllowed:
cpu: 25m
memory: 32Mi
maxAllowed:
cpu: 1000m
memory: 1Gi
controlledResources: ["cpu", "memory"]Key configuration:
updateMode: InPlaceOrRecreate— tries in-place resize first, falls back to pod restartminAllowed/maxAllowed— sets bounds so VPA doesn't over-provision or under-provisioncontrolledResources— which resources VPA manages
Apply it:
kubectl apply -f vpa-demo.yamlStep 4 — Understand VPA Update Modes
VPA supports four update modes:
| Mode | Behavior |
|---|---|
Off | VPA only recommends, never applies changes |
Initial | Sets resources only when pod is first created |
Recreate | Evicts pod and recreates with new resources (old behavior) |
InPlaceOrRecreate | Resizes in-place if possible, evicts if not (new, recommended) |
Always use InPlaceOrRecreate on Kubernetes 1.35+ — there's no reason to restart pods for resource changes anymore.
Step 5 — Generate Load and Watch VPA React
Generate some CPU load on the demo app:
kubectl run load-generator --image=busybox:1.36 --restart=Never -- /bin/sh -c \
"while true; do wget -q -O- http://demo-app.default.svc.cluster.local; done"Watch VPA recommendations update (this takes 5-10 minutes):
kubectl get vpa demo-app-vpa -o yaml | grep -A 20 recommendationYou'll see something like:
recommendation:
containerRecommendations:
- containerName: app
lowerBound:
cpu: 25m
memory: 64Mi
target:
cpu: 150m
memory: 128Mi
upperBound:
cpu: 300m
memory: 256MiCheck if pods have been resized in-place:
kubectl get pod -l app=demo-app -o jsonpath='{range .items[*]}{.metadata.name}: CPU={.spec.containers[0].resources.requests.cpu}, Memory={.spec.containers[0].resources.requests.memory}{"\n"}{end}'Check resize status:
kubectl get pod -l app=demo-app -o jsonpath='{range .items[*]}{.metadata.name}: resize={.status.resize}{"\n"}{end}'If resize is empty, the resize completed successfully. If it shows Deferred, the node doesn't have enough resources right now.
Step 6 — Set Up VPA for a Real Application
Here's a production-ready VPA configuration for a web application:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: web-app-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
updatePolicy:
updateMode: "InPlaceOrRecreate"
minReplicas: 2 # never resize if it would leave fewer than 2 pods
resourcePolicy:
containerPolicies:
- containerName: web
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 2000m
memory: 4Gi
controlledResources: ["cpu", "memory"]
controlledValues: RequestsAndLimitscontrolledValues: RequestsAndLimits means VPA adjusts both requests and limits, maintaining the same ratio. If your requests are 100m CPU with limits at 200m (2:1 ratio), VPA will maintain that ratio when scaling.
Step 7 — Combine VPA with HPA
You can use VPA and HPA together, but with a rule: don't let both control the same resource.
Common pattern: VPA manages memory, HPA manages CPU-based scaling:
# VPA - controls memory only
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "InPlaceOrRecreate"
resourcePolicy:
containerPolicies:
- containerName: app
controlledResources: ["memory"]
minAllowed:
memory: 128Mi
maxAllowed:
memory: 4Gi
---
# HPA - controls replicas based on CPU
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70Troubleshooting
VPA Not Recommending
VPA needs at least 24 hours of metrics to generate good recommendations. For testing, you can check short-term recommendations:
kubectl get vpa demo-app-vpa -o jsonpath='{.status.recommendation}'If empty after 5 minutes, check VPA recommender logs:
kubectl logs -n kube-system deployment/vpa-recommenderResize Stuck in "Deferred"
The node doesn't have enough allocatable resources:
# Check node allocatable vs allocated
kubectl describe node <node-name> | grep -A 5 "Allocated resources"Solutions:
- Scale up the node pool
- Move less critical pods to other nodes
- Reduce
maxAllowedin VPA policy
Pods Still Restarting
Check if resizePolicy is set on the pod spec:
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].resizePolicy}'If missing, the default is RestartContainer. Add resizePolicy with NotRequired to enable in-place resize.
Wrapping Up
VPA with In-Place Resize is the vertical scaling solution Kubernetes has been missing. Your pods get the resources they need, automatically, without restarts. Combined with HPA for horizontal scaling, you get a complete autoscaling strategy.
Start with updateMode: Off to see what VPA recommends for your workloads, then switch to InPlaceOrRecreate once you're comfortable with the recommendations.
Want to master Kubernetes autoscaling — VPA, HPA, KEDA, and cluster autoscaling — with hands-on practice? The KodeKloud Kubernetes course covers all scaling strategies with real cluster labs. For a quick test cluster, DigitalOcean Kubernetes runs Kubernetes 1.35 and makes it easy to test VPA in-place resize.
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
Build a Kubernetes Cluster with kubeadm from Scratch (2026)
Step-by-step guide to building a real multi-node Kubernetes cluster using kubeadm — no managed services, no shortcuts.
How to Build a DevOps Home Lab for Free in 2026
You don't need expensive hardware to practice DevOps. Here's how to build a complete home lab with Kubernetes, CI/CD, and monitoring using free tools and cloud free tiers.
How to Crack the CKA Exam in 2026: Study Plan, Resources, and Tips
Complete CKA exam prep guide for 2026 — what to study, how to practice, which resources actually help, and tips to pass on the first attempt.