All Articles

How to Achieve Zero-Downtime Kubernetes Deployments in 2026

A complete guide to rolling updates, PodDisruptionBudgets, readiness probes, preStop hooks, and graceful shutdown — everything you need to deploy without dropping a single request.

DevOpsBoysMar 11, 20269 min read
Share:Tweet

Nothing kills user trust like a deployment that takes your service down. In a Kubernetes world, there's no excuse for it — the platform gives you everything you need to deploy without dropping a single request. But you have to configure it correctly.

"Zero-downtime deployment" isn't one thing. It's a combination of several mechanisms working together. Miss one, and you'll still see connection errors and 502s during deploys, even with a "rolling update" configured.

This guide walks through every piece you need — from rolling update strategy to graceful shutdown — with the configuration that actually works in production.


Why Rolling Updates Alone Aren't Enough

Most people know about rolling updates (strategy: RollingUpdate). Kubernetes replaces pods one at a time, so there's always capacity available. That's the starting point.

But rolling updates don't guarantee zero downtime on their own. Here's what can still go wrong:

  1. Traffic gets routed to a pod that hasn't finished starting up — if you don't have a readiness probe, Kubernetes adds the new pod to the Service endpoints as soon as it starts, before your app is ready.

  2. Traffic gets sent to a pod that's shutting down — when Kubernetes terminates a pod, it removes it from Service endpoints and sends SIGTERM at roughly the same time. Requests already in flight (or arriving in the milliseconds before endpoint update propagates) hit a pod that's in the process of shutting down.

  3. All old pods are removed too fast — without a PodDisruptionBudget, a node drain could terminate multiple pods simultaneously.

  4. The app ignores SIGTERM — if your application doesn't handle SIGTERM and shut down gracefully, Kubernetes will force-kill it after terminationGracePeriodSeconds, dropping in-flight requests.

Fixing all four of these is what actually achieves zero downtime.


Step 1: Configure the Rolling Update Strategy

This is the foundation. Without it, Kubernetes might use the Recreate strategy (kill all old pods, then start new ones — guaranteed downtime).

yaml
spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1          # Allow 1 extra pod during update
      maxUnavailable: 0    # Never reduce capacity below desired count

Understanding the parameters:

  • maxUnavailable: 0 — This is the key setting. It means Kubernetes will not terminate an old pod until a new pod is Ready. With this set to 0, you always have at least replicas pods running.
  • maxSurge: 1 — Allows creating one extra pod during the update. The update process becomes: start new pod → wait for it to be Ready → terminate old pod → repeat.

With maxUnavailable: 1 (the default), Kubernetes terminates an old pod before the new one is ready — potentially leaving you with one fewer pod than needed during the update.

For production, always set maxUnavailable: 0.


Step 2: Add Readiness and Liveness Probes

A readiness probe tells Kubernetes when a pod is actually ready to receive traffic. Until the readiness probe passes, the pod is excluded from Service endpoints — no traffic reaches it.

A liveness probe tells Kubernetes if the pod is alive. If it fails repeatedly, the pod is restarted.

These are different and both matter:

yaml
containers:
  - name: app
    image: myapp:v2.0
    ports:
      - containerPort: 8080
 
    # Readiness probe — controls traffic routing
    readinessProbe:
      httpGet:
        path: /ready          # Should return 200 only when app is fully initialized
        port: 8080
      initialDelaySeconds: 10  # Wait 10s before first check (startup time)
      periodSeconds: 5         # Check every 5 seconds
      failureThreshold: 3      # Mark unready after 3 consecutive failures
      successThreshold: 1      # Mark ready after 1 success
 
    # Liveness probe — controls pod restart
    livenessProbe:
      httpGet:
        path: /health
        port: 8080
      initialDelaySeconds: 30   # Give app time to start before liveness kicks in
      periodSeconds: 10
      failureThreshold: 3

Critical: Your /ready endpoint should return a non-200 status code if your app isn't fully ready to serve traffic — database connection not established, cache not warmed, etc. Many apps just return 200 immediately, which defeats the purpose.

Common mistake: Setting initialDelaySeconds too low. If your app takes 20 seconds to initialize and you check after 5 seconds, the probe will fail and your pod will be marked unhealthy before it even has a chance to start.

For apps with variable startup times, use a startup probe:

yaml
startupProbe:
  httpGet:
    path: /health
    port: 8080
  failureThreshold: 30      # Allow up to 30 * 10s = 5 minutes to start
  periodSeconds: 10

The liveness probe doesn't start until the startup probe succeeds once. This prevents Kubernetes from killing slow-starting pods.


Step 3: Handle SIGTERM Gracefully in Your Application

When Kubernetes wants to terminate a pod, it sends SIGTERM to the main process. Your application needs to:

  1. Stop accepting new connections
  2. Finish processing in-flight requests
  3. Close database connections cleanly
  4. Exit with code 0

If you don't handle SIGTERM, your application will continue accepting requests until Kubernetes force-kills it after terminationGracePeriodSeconds (default: 30s). Any requests in progress at that moment are dropped.

Node.js example:

javascript
const server = app.listen(8080)
 
process.on('SIGTERM', () => {
  console.log('SIGTERM received — shutting down gracefully')
  server.close(() => {
    console.log('All connections closed — exiting')
    process.exit(0)
  })
  // Force exit if graceful shutdown takes too long
  setTimeout(() => process.exit(1), 25000)
})

Python (FastAPI/uvicorn) example:

python
import signal
import asyncio
from contextlib import asynccontextmanager
 
@asynccontextmanager
async def lifespan(app: FastAPI):
    # Startup
    yield
    # Shutdown — runs when SIGTERM received
    await database.disconnect()
    await redis_client.close()
 
app = FastAPI(lifespan=lifespan)

Go example:

go
quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGTERM, syscall.SIGINT)
<-quit
 
ctx, cancel := context.WithTimeout(context.Background(), 25*time.Second)
defer cancel()
 
if err := srv.Shutdown(ctx); err != nil {
    log.Fatal("Server forced to shutdown:", err)
}
log.Println("Server exiting")

Step 4: Add a preStop Hook to Handle the Race Condition

Here's a subtle but important problem: when Kubernetes terminates a pod, it does two things simultaneously:

  1. Sends SIGTERM to the pod
  2. Removes the pod from Service endpoints

The endpoint update has to propagate to kube-proxy on every node, which takes a few seconds. During those seconds, traffic is still being routed to the pod — but the pod has already started shutting down.

The fix is a preStop hook that adds a small sleep before SIGTERM is processed:

yaml
lifecycle:
  preStop:
    exec:
      command: ["/bin/sh", "-c", "sleep 5"]

What this does:

  1. Kubernetes calls the preStop hook (sleep 5)
  2. During those 5 seconds, the pod is still running and serving traffic
  3. Kubernetes simultaneously removes the pod from endpoints — propagation completes in ~2-3 seconds
  4. After sleep 5, Kubernetes sends SIGTERM
  5. Your app shuts down gracefully, with no more new traffic arriving

This 5-second sleep is a small but critical piece. Without it, you'll see occasional 502 errors during rolling deployments even with everything else configured correctly.

Make sure terminationGracePeriodSeconds is long enough to accommodate the preStop hook + your app's shutdown time:

yaml
spec:
  terminationGracePeriodSeconds: 60   # preStop(5s) + app shutdown(up to 55s)

Step 5: Configure a PodDisruptionBudget

A PodDisruptionBudget (PDB) protects your service during voluntary disruptions — node drains, cluster upgrades, maintenance operations.

Without a PDB, if you drain a node for maintenance and all your pods happen to be on that node, they can all be terminated simultaneously.

yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
  namespace: production
spec:
  minAvailable: 2       # Always keep at least 2 pods running
  selector:
    matchLabels:
      app: my-app

Or expressed as a percentage:

yaml
spec:
  minAvailable: "50%"   # Keep at least 50% of pods running at all times

With this PDB, a kubectl drain will wait for pods to be rescheduled elsewhere before proceeding — it won't drain a node in a way that would violate your minimum available count.

Important: PDBs only protect against voluntary disruptions. They don't protect against node failures.


The Complete Deployment Manifest

Putting it all together — this is what a zero-downtime production deployment looks like:

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0       # ← Never reduce below desired count
  template:
    metadata:
      labels:
        app: my-app
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: app
          image: myorg/my-app:v2.0
          ports:
            - containerPort: 8080
 
          # Resources (required for proper scheduling)
          resources:
            requests:
              cpu: "100m"
              memory: "128Mi"
            limits:
              cpu: "500m"
              memory: "512Mi"
 
          # Readiness — controls traffic routing
          readinessProbe:
            httpGet:
              path: /ready
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 5
            failureThreshold: 3
 
          # Liveness — controls pod restart
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 10
            failureThreshold: 3
 
          # Startup probe for slow-starting apps
          startupProbe:
            httpGet:
              path: /health
              port: 8080
            failureThreshold: 30
            periodSeconds: 10
 
          # preStop hook — handles endpoint propagation delay
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "sleep 5"]
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
  namespace: production
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: my-app

Validating Your Zero-Downtime Setup

After deploying this configuration, test it properly:

bash
# Terminal 1: Continuously hit your service during a deployment
while true; do
  status=$(curl -s -o /dev/null -w "%{http_code}" https://your-service/health)
  echo "$(date +%H:%M:%S) — $status"
  sleep 0.5
done
 
# Terminal 2: Trigger a rolling update
kubectl set image deployment/my-app app=myorg/my-app:v2.1 -n production
kubectl rollout status deployment/my-app -n production

You should see 200s throughout the entire rollout. If you see any 502s or 503s, check the component that's failing:

  • 502 at start of update → preStop hook missing or too short
  • 502 during update → maxUnavailable is too high, or readiness probe is wrong
  • 502 at end of update → graceful shutdown not handling SIGTERM properly

Beyond Rolling Updates: Blue-Green and Canary

Rolling updates work well for most cases. But sometimes you need more control.

Blue-Green Deployment: Run two identical environments (blue = current, green = new). Switch traffic instantly by updating the Service selector.

bash
# Blue is live, green is the new version
kubectl patch service my-app-svc \
  -p '{"spec":{"selector":{"version":"green"}}}'

Canary Deployment: Send a percentage of traffic to the new version. Using Nginx Ingress:

yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-app-canary
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "10"   # 10% to canary
spec:
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: my-app-canary-svc
                port:
                  number: 80

For more advanced progressive delivery (automatic promotion/rollback based on metrics), look at Argo Rollouts — it handles blue-green and canary deployments with automatic analysis gates.


Checklist: Zero-Downtime Deployment

Before deploying a service to production, verify:

  • strategy.type: RollingUpdate with maxUnavailable: 0
  • Readiness probe configured with correct path and timing
  • Liveness probe configured (separate from readiness)
  • Startup probe for apps with long initialization
  • Application handles SIGTERM gracefully (finishes in-flight requests)
  • lifecycle.preStop sleep of 5 seconds
  • terminationGracePeriodSeconds ≥ preStop time + app shutdown time
  • PodDisruptionBudget configured
  • At least 2 replicas (single replica = no rolling update, just restart)
  • Tested with live traffic during a deploy

Keep Learning

Zero-downtime deployments are a prerequisite for any serious Kubernetes production deployment. If you want to go deeper into Kubernetes production patterns:

  • KodeKloud — CKA and CKAD courses cover rolling updates, disruption budgets, and production deployment patterns with hands-on labs
  • DigitalOcean Managed Kubernetes — A solid choice for hosting production K8s clusters without the overhead of managing the control plane yourself

Zero downtime in Kubernetes isn't magic — it's configuration. Every piece serves a specific purpose. Rolling update strategy prevents capacity loss. Readiness probes prevent premature traffic. The preStop hook closes the race condition. Graceful shutdown protects in-flight requests. PodDisruptionBudgets protect against maintenance disasters. Together, they give you true zero-downtime deployments.

Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments