What Is a Canary Deployment? Explained Simply
Canary deployments let you test a new version on a small slice of real traffic before going all-in. Here's what it actually means, how it differs from blue-green, and a simple example.
The name comes from "canary in a coal mine" — miners brought caged canaries underground because the birds were more sensitive to toxic gas, and if the canary showed distress, miners knew to get out before it became dangerous for them too. A canary deployment applies the same idea to software: send a small amount of real traffic to the new version first, and if something's wrong, you find out before it affects everyone.
The Basic Idea
100% traffic → v1 (stable)
Deploy v2 as a canary:
95% traffic → v1 (stable)
5% traffic → v2 (canary)
Monitor v2's error rate, latency, and key business metrics. If healthy:
50% traffic → v1
50% traffic → v2
Continue increasing until:
100% traffic → v2 (v1 retired)
If at any point v2 shows elevated errors, you route traffic back to v1 — and because only a small percentage of users ever saw v2, the blast radius of a bad deploy stayed small.
How This Differs From Blue-Green Deployment
This is the comparison people mix up most often.
Blue-green runs two complete environments. You switch 100% of traffic from the old environment to the new one in a single cutover. If something's wrong, you switch back instantly. There's no gradual traffic split — it's all-or-nothing at the moment of cutover, just with instant rollback available.
Canary gradually shifts a percentage of traffic over time, observing real production behavior at each step before increasing further. The new version is exposed to real risk, but in small, controlled doses.
Blue-Green: [v1: 100%] → instant cutover → [v2: 100%]
↑ rollback = instant switch back
Canary: [v1: 95%, v2: 5%] → [v1: 50%, v2: 50%] → [v1: 0%, v2: 100%]
↑ rollback at any stage = route back to 100% v1
Canary catches problems that only show up under real production load and real user diversity — blue-green's instant full cutover doesn't give you that gradual exposure.
A Simple Kubernetes Example with Flagger
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: checkout-service
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: checkout-service
service:
port: 8080
analysis:
interval: 1m
threshold: 5 # number of failed checks before automatic rollback
maxWeight: 50 # never exceed 50% traffic to canary automatically
stepWeight: 5 # increase by 5% each interval
metrics:
- name: request-success-rate
thresholdRange:
min: 99 # rollback if success rate drops below 99%
interval: 1m
- name: request-duration
thresholdRange:
max: 500 # rollback if p99 latency exceeds 500ms
interval: 1mFlagger automates the entire process — deploy a new version, it automatically starts routing a small percentage of traffic, checks the metrics you defined, and either continues increasing traffic or automatically rolls back if thresholds are breached, with no human needing to watch a dashboard and manually decide.
What You're Actually Measuring During a Canary
The metrics that matter depend on what could break, but the common baseline:
- Error rate (4xx/5xx responses) — is the new version actually breaking requests?
- Latency (p50, p95, p99) — is it slower under real load patterns?
- Business metrics — checkout completion rate, signup conversion — sometimes a
deploy is "technically healthy" (no errors, normal latency) but quietly breaks
a business flow in a way pure infra metrics won't catch
That last category is why canary analysis configs increasingly include business KPIs, not just infrastructure health — a bug that doesn't throw errors but breaks a checkout button still needs to trigger a rollback.
When Canary Deployments Are Worth the Setup Effort
Worth it for: high-traffic services where a bad deploy affecting 100% of users immediately is genuinely costly, services with enough traffic volume that a 5% slice still gives statistically meaningful signal within minutes.
Not worth it for: low-traffic internal tools where 5% of traffic might be one request every few hours — there's no meaningful signal at that volume, and blue-green's simpler instant-rollback model serves you just as well with less operational complexity.
Set up automated canary rollouts: How to Implement Canary Deployments with Flagger
Today I Fixed
Short real fixes from production — posted daily
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
How to Set Up GitLab CI/CD from Scratch (2026 Complete Tutorial)
A practical step-by-step guide to setting up GitLab CI/CD pipelines from zero — covering runners, pipeline stages, Docker builds, deployment to Kubernetes, and best practices.
Jenkins vs Tekton — Which CI Tool Should You Use for Kubernetes in 2026?
Jenkins is the old reliable. Tekton is cloud-native, Kubernetes-native, and built for containers. Here's a detailed comparison so you can pick the right CI tool for your cluster.
What Is a Feature Flag? Explained for Beginners
Feature flags let you turn features on and off without redeploying code. Here's what they actually are, why DevOps teams care about them, and how to use one safely in production.