How to Achieve Zero-Downtime Kubernetes Deployments in 2026
A complete guide to rolling updates, PodDisruptionBudgets, readiness probes, preStop hooks, and graceful shutdown — everything you need to deploy without dropping a single request.
Nothing kills user trust like a deployment that takes your service down. In a Kubernetes world, there's no excuse for it — the platform gives you everything you need to deploy without dropping a single request. But you have to configure it correctly.
"Zero-downtime deployment" isn't one thing. It's a combination of several mechanisms working together. Miss one, and you'll still see connection errors and 502s during deploys, even with a "rolling update" configured.
This guide walks through every piece you need — from rolling update strategy to graceful shutdown — with the configuration that actually works in production.
Why Rolling Updates Alone Aren't Enough
Most people know about rolling updates (strategy: RollingUpdate). Kubernetes replaces pods one at a time, so there's always capacity available. That's the starting point.
But rolling updates don't guarantee zero downtime on their own. Here's what can still go wrong:
-
Traffic gets routed to a pod that hasn't finished starting up — if you don't have a readiness probe, Kubernetes adds the new pod to the Service endpoints as soon as it starts, before your app is ready.
-
Traffic gets sent to a pod that's shutting down — when Kubernetes terminates a pod, it removes it from Service endpoints and sends SIGTERM at roughly the same time. Requests already in flight (or arriving in the milliseconds before endpoint update propagates) hit a pod that's in the process of shutting down.
-
All old pods are removed too fast — without a PodDisruptionBudget, a node drain could terminate multiple pods simultaneously.
-
The app ignores SIGTERM — if your application doesn't handle SIGTERM and shut down gracefully, Kubernetes will force-kill it after
terminationGracePeriodSeconds, dropping in-flight requests.
Fixing all four of these is what actually achieves zero downtime.
Step 1: Configure the Rolling Update Strategy
This is the foundation. Without it, Kubernetes might use the Recreate strategy (kill all old pods, then start new ones — guaranteed downtime).
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # Allow 1 extra pod during update
maxUnavailable: 0 # Never reduce capacity below desired countUnderstanding the parameters:
maxUnavailable: 0— This is the key setting. It means Kubernetes will not terminate an old pod until a new pod is Ready. With this set to 0, you always have at leastreplicaspods running.maxSurge: 1— Allows creating one extra pod during the update. The update process becomes: start new pod → wait for it to be Ready → terminate old pod → repeat.
With maxUnavailable: 1 (the default), Kubernetes terminates an old pod before the new one is ready — potentially leaving you with one fewer pod than needed during the update.
For production, always set maxUnavailable: 0.
Step 2: Add Readiness and Liveness Probes
A readiness probe tells Kubernetes when a pod is actually ready to receive traffic. Until the readiness probe passes, the pod is excluded from Service endpoints — no traffic reaches it.
A liveness probe tells Kubernetes if the pod is alive. If it fails repeatedly, the pod is restarted.
These are different and both matter:
containers:
- name: app
image: myapp:v2.0
ports:
- containerPort: 8080
# Readiness probe — controls traffic routing
readinessProbe:
httpGet:
path: /ready # Should return 200 only when app is fully initialized
port: 8080
initialDelaySeconds: 10 # Wait 10s before first check (startup time)
periodSeconds: 5 # Check every 5 seconds
failureThreshold: 3 # Mark unready after 3 consecutive failures
successThreshold: 1 # Mark ready after 1 success
# Liveness probe — controls pod restart
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30 # Give app time to start before liveness kicks in
periodSeconds: 10
failureThreshold: 3Critical: Your /ready endpoint should return a non-200 status code if your app isn't fully ready to serve traffic — database connection not established, cache not warmed, etc. Many apps just return 200 immediately, which defeats the purpose.
Common mistake: Setting initialDelaySeconds too low. If your app takes 20 seconds to initialize and you check after 5 seconds, the probe will fail and your pod will be marked unhealthy before it even has a chance to start.
For apps with variable startup times, use a startup probe:
startupProbe:
httpGet:
path: /health
port: 8080
failureThreshold: 30 # Allow up to 30 * 10s = 5 minutes to start
periodSeconds: 10The liveness probe doesn't start until the startup probe succeeds once. This prevents Kubernetes from killing slow-starting pods.
Step 3: Handle SIGTERM Gracefully in Your Application
When Kubernetes wants to terminate a pod, it sends SIGTERM to the main process. Your application needs to:
- Stop accepting new connections
- Finish processing in-flight requests
- Close database connections cleanly
- Exit with code 0
If you don't handle SIGTERM, your application will continue accepting requests until Kubernetes force-kills it after terminationGracePeriodSeconds (default: 30s). Any requests in progress at that moment are dropped.
Node.js example:
const server = app.listen(8080)
process.on('SIGTERM', () => {
console.log('SIGTERM received — shutting down gracefully')
server.close(() => {
console.log('All connections closed — exiting')
process.exit(0)
})
// Force exit if graceful shutdown takes too long
setTimeout(() => process.exit(1), 25000)
})Python (FastAPI/uvicorn) example:
import signal
import asyncio
from contextlib import asynccontextmanager
@asynccontextmanager
async def lifespan(app: FastAPI):
# Startup
yield
# Shutdown — runs when SIGTERM received
await database.disconnect()
await redis_client.close()
app = FastAPI(lifespan=lifespan)Go example:
quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGTERM, syscall.SIGINT)
<-quit
ctx, cancel := context.WithTimeout(context.Background(), 25*time.Second)
defer cancel()
if err := srv.Shutdown(ctx); err != nil {
log.Fatal("Server forced to shutdown:", err)
}
log.Println("Server exiting")Step 4: Add a preStop Hook to Handle the Race Condition
Here's a subtle but important problem: when Kubernetes terminates a pod, it does two things simultaneously:
- Sends SIGTERM to the pod
- Removes the pod from Service endpoints
The endpoint update has to propagate to kube-proxy on every node, which takes a few seconds. During those seconds, traffic is still being routed to the pod — but the pod has already started shutting down.
The fix is a preStop hook that adds a small sleep before SIGTERM is processed:
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5"]What this does:
- Kubernetes calls the preStop hook (
sleep 5) - During those 5 seconds, the pod is still running and serving traffic
- Kubernetes simultaneously removes the pod from endpoints — propagation completes in ~2-3 seconds
- After
sleep 5, Kubernetes sends SIGTERM - Your app shuts down gracefully, with no more new traffic arriving
This 5-second sleep is a small but critical piece. Without it, you'll see occasional 502 errors during rolling deployments even with everything else configured correctly.
Make sure terminationGracePeriodSeconds is long enough to accommodate the preStop hook + your app's shutdown time:
spec:
terminationGracePeriodSeconds: 60 # preStop(5s) + app shutdown(up to 55s)Step 5: Configure a PodDisruptionBudget
A PodDisruptionBudget (PDB) protects your service during voluntary disruptions — node drains, cluster upgrades, maintenance operations.
Without a PDB, if you drain a node for maintenance and all your pods happen to be on that node, they can all be terminated simultaneously.
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-app-pdb
namespace: production
spec:
minAvailable: 2 # Always keep at least 2 pods running
selector:
matchLabels:
app: my-appOr expressed as a percentage:
spec:
minAvailable: "50%" # Keep at least 50% of pods running at all timesWith this PDB, a kubectl drain will wait for pods to be rescheduled elsewhere before proceeding — it won't drain a node in a way that would violate your minimum available count.
Important: PDBs only protect against voluntary disruptions. They don't protect against node failures.
The Complete Deployment Manifest
Putting it all together — this is what a zero-downtime production deployment looks like:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: my-app
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0 # ← Never reduce below desired count
template:
metadata:
labels:
app: my-app
spec:
terminationGracePeriodSeconds: 60
containers:
- name: app
image: myorg/my-app:v2.0
ports:
- containerPort: 8080
# Resources (required for proper scheduling)
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "512Mi"
# Readiness — controls traffic routing
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 3
# Liveness — controls pod restart
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 3
# Startup probe for slow-starting apps
startupProbe:
httpGet:
path: /health
port: 8080
failureThreshold: 30
periodSeconds: 10
# preStop hook — handles endpoint propagation delay
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5"]
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-app-pdb
namespace: production
spec:
minAvailable: 2
selector:
matchLabels:
app: my-appValidating Your Zero-Downtime Setup
After deploying this configuration, test it properly:
# Terminal 1: Continuously hit your service during a deployment
while true; do
status=$(curl -s -o /dev/null -w "%{http_code}" https://your-service/health)
echo "$(date +%H:%M:%S) — $status"
sleep 0.5
done
# Terminal 2: Trigger a rolling update
kubectl set image deployment/my-app app=myorg/my-app:v2.1 -n production
kubectl rollout status deployment/my-app -n productionYou should see 200s throughout the entire rollout. If you see any 502s or 503s, check the component that's failing:
- 502 at start of update → preStop hook missing or too short
- 502 during update → maxUnavailable is too high, or readiness probe is wrong
- 502 at end of update → graceful shutdown not handling SIGTERM properly
Beyond Rolling Updates: Blue-Green and Canary
Rolling updates work well for most cases. But sometimes you need more control.
Blue-Green Deployment: Run two identical environments (blue = current, green = new). Switch traffic instantly by updating the Service selector.
# Blue is live, green is the new version
kubectl patch service my-app-svc \
-p '{"spec":{"selector":{"version":"green"}}}'Canary Deployment: Send a percentage of traffic to the new version. Using Nginx Ingress:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-app-canary
annotations:
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-weight: "10" # 10% to canary
spec:
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-app-canary-svc
port:
number: 80For more advanced progressive delivery (automatic promotion/rollback based on metrics), look at Argo Rollouts — it handles blue-green and canary deployments with automatic analysis gates.
Checklist: Zero-Downtime Deployment
Before deploying a service to production, verify:
-
strategy.type: RollingUpdatewithmaxUnavailable: 0 - Readiness probe configured with correct path and timing
- Liveness probe configured (separate from readiness)
- Startup probe for apps with long initialization
- Application handles SIGTERM gracefully (finishes in-flight requests)
-
lifecycle.preStopsleep of 5 seconds -
terminationGracePeriodSeconds≥ preStop time + app shutdown time - PodDisruptionBudget configured
- At least 2 replicas (single replica = no rolling update, just restart)
- Tested with live traffic during a deploy
Keep Learning
Zero-downtime deployments are a prerequisite for any serious Kubernetes production deployment. If you want to go deeper into Kubernetes production patterns:
- KodeKloud — CKA and CKAD courses cover rolling updates, disruption budgets, and production deployment patterns with hands-on labs
- DigitalOcean Managed Kubernetes — A solid choice for hosting production K8s clusters without the overhead of managing the control plane yourself
Zero downtime in Kubernetes isn't magic — it's configuration. Every piece serves a specific purpose. Rolling update strategy prevents capacity loss. Readiness probes prevent premature traffic. The preStop hook closes the race condition. Graceful shutdown protects in-flight requests. PodDisruptionBudgets protect against maintenance disasters. Together, they give you true zero-downtime deployments.
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
Best DevOps Tools Every Engineer Should Know in 2026
A comprehensive guide to the essential DevOps tools for containers, CI/CD, infrastructure, monitoring, and security — curated for practicing engineers.
How to Build a DevSecOps Pipeline from Scratch in 2026 (GitHub Actions + Trivy + SAST)
A step-by-step guide to building a complete DevSecOps pipeline. Learn how to embed security scanning, SAST, secrets detection, and container vulnerability scanning into your CI/CD workflow using GitHub Actions.
ArgoCD vs Flux vs Jenkins — GitOps Comparison 2026
A deep-dive comparison of the three most popular GitOps and CI/CD tools — ArgoCD, Flux CD, and Jenkins. Learn which one fits your team, use case, and Kubernetes setup.