How to Set Up Vertical Pod Autoscaler (VPA) with In-Place Resize in Kubernetes
Step-by-step guide to setting up Kubernetes VPA with In-Place Pod Resize. Auto-scale CPU and memory without pod restarts. Full tutorial with YAML examples.
Horizontal Pod Autoscaler (HPA) adds more pods when load increases. But what about when a single pod needs more CPU or memory? That's what Vertical Pod Autoscaler (VPA) does — and with Kubernetes 1.35's In-Place Pod Resize, it can now do it without restarting your pods.
This guide walks you through setting up VPA with in-place resize from scratch.
Prerequisites
- Kubernetes 1.35+ cluster
kubectlconfigured and connectedgitinstalled- Metrics Server running in the cluster
Check your Kubernetes version:
kubectl version --shortCheck if Metrics Server is running:
kubectl get deployment metrics-server -n kube-systemIf Metrics Server isn't installed:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yamlStep 1 — Install VPA
Clone the autoscaler repository and install VPA components:
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscalerRun the installation script:
./hack/vpa-up.shThis installs three components:
- VPA Recommender — analyzes resource usage and generates recommendations
- VPA Updater — applies recommendations to pods
- VPA Admission Controller — sets initial resources for new pods
Verify everything is running:
kubectl get pods -n kube-system | grep vpaYou should see:
vpa-admission-controller-xxx 1/1 Running
vpa-recommender-xxx 1/1 Running
vpa-updater-xxx 1/1 Running
Step 2 — Deploy a Sample Application
Create a simple deployment to test VPA:
# demo-app.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: demo-app
spec:
replicas: 2
selector:
matchLabels:
app: demo-app
template:
metadata:
labels:
app: demo-app
spec:
containers:
- name: app
image: nginx:1.27
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 100m
memory: 128Mi
resizePolicy:
- resourceName: cpu
restartPolicy: NotRequired
- resourceName: memory
restartPolicy: NotRequiredThe resizePolicy tells Kubernetes that both CPU and memory can be changed in-place without restarting the container.
Apply it:
kubectl apply -f demo-app.yamlStep 3 — Create a VPA with InPlaceOrRecreate Mode
# vpa-demo.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: demo-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: demo-app
updatePolicy:
updateMode: "InPlaceOrRecreate"
resourcePolicy:
containerPolicies:
- containerName: app
minAllowed:
cpu: 25m
memory: 32Mi
maxAllowed:
cpu: 1000m
memory: 1Gi
controlledResources: ["cpu", "memory"]Key configuration:
updateMode: InPlaceOrRecreate— tries in-place resize first, falls back to pod restartminAllowed/maxAllowed— sets bounds so VPA doesn't over-provision or under-provisioncontrolledResources— which resources VPA manages
Apply it:
kubectl apply -f vpa-demo.yamlStep 4 — Understand VPA Update Modes
VPA supports four update modes:
| Mode | Behavior |
|---|---|
Off | VPA only recommends, never applies changes |
Initial | Sets resources only when pod is first created |
Recreate | Evicts pod and recreates with new resources (old behavior) |
InPlaceOrRecreate | Resizes in-place if possible, evicts if not (new, recommended) |
Always use InPlaceOrRecreate on Kubernetes 1.35+ — there's no reason to restart pods for resource changes anymore.
Step 5 — Generate Load and Watch VPA React
Generate some CPU load on the demo app:
kubectl run load-generator --image=busybox:1.36 --restart=Never -- /bin/sh -c \
"while true; do wget -q -O- http://demo-app.default.svc.cluster.local; done"Watch VPA recommendations update (this takes 5-10 minutes):
kubectl get vpa demo-app-vpa -o yaml | grep -A 20 recommendationYou'll see something like:
recommendation:
containerRecommendations:
- containerName: app
lowerBound:
cpu: 25m
memory: 64Mi
target:
cpu: 150m
memory: 128Mi
upperBound:
cpu: 300m
memory: 256MiCheck if pods have been resized in-place:
kubectl get pod -l app=demo-app -o jsonpath='{range .items[*]}{.metadata.name}: CPU={.spec.containers[0].resources.requests.cpu}, Memory={.spec.containers[0].resources.requests.memory}{"\n"}{end}'Check resize status:
kubectl get pod -l app=demo-app -o jsonpath='{range .items[*]}{.metadata.name}: resize={.status.resize}{"\n"}{end}'If resize is empty, the resize completed successfully. If it shows Deferred, the node doesn't have enough resources right now.
Step 6 — Set Up VPA for a Real Application
Here's a production-ready VPA configuration for a web application:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: web-app-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
updatePolicy:
updateMode: "InPlaceOrRecreate"
minReplicas: 2 # never resize if it would leave fewer than 2 pods
resourcePolicy:
containerPolicies:
- containerName: web
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 2000m
memory: 4Gi
controlledResources: ["cpu", "memory"]
controlledValues: RequestsAndLimitscontrolledValues: RequestsAndLimits means VPA adjusts both requests and limits, maintaining the same ratio. If your requests are 100m CPU with limits at 200m (2:1 ratio), VPA will maintain that ratio when scaling.
Step 7 — Combine VPA with HPA
You can use VPA and HPA together, but with a rule: don't let both control the same resource.
Common pattern: VPA manages memory, HPA manages CPU-based scaling:
# VPA - controls memory only
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "InPlaceOrRecreate"
resourcePolicy:
containerPolicies:
- containerName: app
controlledResources: ["memory"]
minAllowed:
memory: 128Mi
maxAllowed:
memory: 4Gi
---
# HPA - controls replicas based on CPU
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70Troubleshooting
VPA Not Recommending
VPA needs at least 24 hours of metrics to generate good recommendations. For testing, you can check short-term recommendations:
kubectl get vpa demo-app-vpa -o jsonpath='{.status.recommendation}'If empty after 5 minutes, check VPA recommender logs:
kubectl logs -n kube-system deployment/vpa-recommenderResize Stuck in "Deferred"
The node doesn't have enough allocatable resources:
# Check node allocatable vs allocated
kubectl describe node <node-name> | grep -A 5 "Allocated resources"Solutions:
- Scale up the node pool
- Move less critical pods to other nodes
- Reduce
maxAllowedin VPA policy
Pods Still Restarting
Check if resizePolicy is set on the pod spec:
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].resizePolicy}'If missing, the default is RestartContainer. Add resizePolicy with NotRequired to enable in-place resize.
Wrapping Up
VPA with In-Place Resize is the vertical scaling solution Kubernetes has been missing. Your pods get the resources they need, automatically, without restarts. Combined with HPA for horizontal scaling, you get a complete autoscaling strategy.
Start with updateMode: Off to see what VPA recommends for your workloads, then switch to InPlaceOrRecreate once you're comfortable with the recommendations.
Want to master Kubernetes autoscaling — VPA, HPA, KEDA, and cluster autoscaling — with hands-on practice? The KodeKloud Kubernetes course covers all scaling strategies with real cluster labs. For a quick test cluster, DigitalOcean Kubernetes runs Kubernetes 1.35 and makes it easy to test VPA in-place resize.
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
Build a Kubernetes Cluster with kubeadm from Scratch (2026)
Step-by-step guide to building a real multi-node Kubernetes cluster using kubeadm — no managed services, no shortcuts.
How to Migrate from Ingress-NGINX to Kubernetes Gateway API in 2026
Step-by-step guide to migrating from Ingress-NGINX to Kubernetes Gateway API. Includes YAML examples, implementation choices, testing strategy, and cutover plan.
How to Set Up HashiCorp Vault for Secrets Management from Scratch (2026)
HashiCorp Vault is the industry standard for secrets management. This step-by-step guide shows you how to install Vault, configure it, and integrate it with Kubernetes.