All Articles

How to Implement Canary Deployments with Flagger on Kubernetes (2026)

Flagger automates canary deployments on Kubernetes — progressively shifting traffic to new versions and rolling back automatically if metrics degrade. This step-by-step guide shows you how to set it up with Nginx Ingress.

DevOpsBoysMar 16, 20267 min read
Share:Tweet

Deploying a new version of your service to 100% of production traffic all at once is a risk. Even with good testing, you don't know how new code behaves under real production traffic until it's live.

Canary deployments solve this: you shift a small percentage of traffic (say, 5%) to the new version first. If it behaves well — no increase in errors, no latency regression — you gradually increase to 10%, 20%, 50%, then 100%. If anything goes wrong, you roll back to the old version automatically.

Flagger is the Kubernetes tool that automates this entire process. It watches your metrics and handles the traffic shifting, promotion, and rollback — without human intervention.


What Is Flagger?

Flagger is a CNCF project that automates progressive delivery on Kubernetes. It supports:

  • Canary releases: gradual traffic shifting
  • A/B testing: traffic routing based on HTTP headers or cookies
  • Blue/Green: traffic switching with testing

Flagger integrates with:

  • Traffic routing: Nginx Ingress, Istio, Linkerd, Contour, Traefik
  • Metrics: Prometheus, Datadog, New Relic, CloudWatch
  • Notifications: Slack, Teams, Discord, generic webhooks

This guide uses Nginx Ingress + Prometheus — the most common setup.


How Flagger Canary Works

When you create a Canary resource, Flagger:

  1. Creates a primary deployment (the current stable version)
  2. Creates a canary deployment (the new version under test)
  3. Adjusts Ingress rules to split traffic: e.g., 95% → primary, 5% → canary
  4. Checks your Prometheus metrics every analysis interval
  5. If metrics are healthy: increase canary weight (5% → 10% → 20%... → 100%)
  6. If metrics degrade: automatically roll back to 100% primary
  7. If promotion succeeds: primary becomes the new version, canary is cleaned up
Deploy new version
        │
        ▼
[Canary: 5% traffic] ──── check metrics ──── OK ──▶ [10%] ──▶ [20%] ──▶ [100%] ──▶ PROMOTED
                                              │
                                          DEGRADED
                                              │
                                              ▼
                                        ROLLBACK (0% canary)

Prerequisites

  • Kubernetes cluster (1.24+)
  • Nginx Ingress Controller installed
  • Prometheus installed (for metrics analysis)
  • Helm installed

Step 1: Install Flagger

bash
helm repo add flagger https://flagger.app
helm repo update
 
# Install Flagger with Nginx Ingress provider
helm install flagger flagger/flagger \
  --namespace flagger-system \
  --create-namespace \
  --set meshProvider=nginx \
  --set metricsServer=http://prometheus.monitoring.svc.cluster.local:9090

Install the Prometheus addon (if you don't have Prometheus yet):

bash
helm install flagger-prometheus flagger/prometheus \
  --namespace flagger-system

Verify Flagger is running:

bash
kubectl get pods -n flagger-system
# NAME                         READY   STATUS    RESTARTS
# flagger-xxx                  1/1     Running   0
# flagger-prometheus-xxx       1/1     Running   0

Step 2: Set Up Your Deployment

Your app needs a standard Kubernetes Deployment and Service. Flagger will manage the canary variants automatically.

yaml
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  namespace: production
  labels:
    app: my-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
        prometheus.io/path: "/metrics"
    spec:
      containers:
        - name: app
          image: my-app:1.0.0
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 256Mi
          readinessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: my-app
  namespace: production
spec:
  selector:
    app: my-app
  ports:
    - port: 80
      targetPort: 8080
yaml
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-app
  namespace: production
  annotations:
    kubernetes.io/ingress.class: nginx
spec:
  rules:
    - host: myapp.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: my-app
                port:
                  number: 80

Apply them:

bash
kubectl apply -f deployment.yaml
kubectl apply -f ingress.yaml

Step 3: Create the Flagger Canary Resource

This is where you define the canary strategy:

yaml
# canary.yaml
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: my-app
  namespace: production
spec:
  # The deployment Flagger will manage
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
 
  # The Ingress to modify for traffic splitting
  ingressRef:
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    name: my-app
 
  # Canary analysis settings
  analysis:
    # Check metrics every 60 seconds
    interval: 1m
 
    # Promote if metrics are good for 10 consecutive checks
    threshold: 10
 
    # Maximum number of failed checks before rollback
    maxWeight: 50         # cap canary at 50%
    stepWeight: 10        # increase by 10% each step
 
    # Prometheus metrics to check
    metrics:
      - name: request-success-rate
        # Must stay above 99% success rate
        thresholdRange:
          min: 99
        interval: 1m
 
      - name: request-duration
        # P99 latency must stay below 500ms
        thresholdRange:
          max: 500
        interval: 1m
 
    # Slack notification
    webhooks:
      - name: slack-notification
        type: slack
        url: https://hooks.slack.com/services/YOUR/WEBHOOK/URL
        metadata:
          channel: "#deployments"
          username: "Flagger"

Apply the Canary:

bash
kubectl apply -f canary.yaml

Flagger immediately takes over your Deployment. Check what it created:

bash
kubectl get canary -n production
# NAME     STATUS      WEIGHT   LASTTRANSITIONTIME
# my-app   Initialized 0        2026-03-16T10:00:00Z
 
kubectl get deployments -n production
# NAME             READY   UP-TO-DATE
# my-app           2/2     2          ← Flagger created this (primary)
# my-app-primary   2/2     2          ← Stable version

Step 4: Trigger a Canary Release

To trigger the canary process, update your Deployment's image:

bash
kubectl set image deployment/my-app app=my-app:2.0.0 -n production

Or update via your CI/CD pipeline. Flagger detects the image change and starts the analysis.

Watch the progression:

bash
kubectl describe canary my-app -n production

You'll see events like:

Events:
  Normal  Synced  1m   flagger  New revision detected! Scaling up my-app.production
  Normal  Synced  2m   flagger  Starting canary analysis for my-app.production
  Normal  Synced  3m   flagger  Advance my-app.production canary weight 10
  Normal  Synced  4m   flagger  Advance my-app.production canary weight 20
  Normal  Synced  5m   flagger  Advance my-app.production canary weight 30
  Normal  Synced  6m   flagger  Advance my-app.production canary weight 40
  Normal  Synced  7m   flagger  Advance my-app.production canary weight 50
  Normal  Synced  8m   flagger  Copying my-app.production template spec to my-app-primary.production
  Normal  Synced  9m   flagger  Routing all traffic to primary
  Normal  Synced  10m  flagger  Promotion completed! Scaling down my-app.production

The canary was promoted. Your app is now running 2.0.0 as the primary.


Step 5: Observe a Rollback

If your new version is bad, Flagger rolls back automatically. To simulate this, deploy a broken version:

bash
kubectl set image deployment/my-app app=my-app:broken -n production

Flagger starts the canary. When the error rate exceeds your threshold, you'll see:

Events:
  Normal   Synced  1m  flagger  New revision detected! Scaling up my-app.production
  Normal   Synced  2m  flagger  Starting canary analysis for my-app.production
  Warning  Synced  3m  flagger  Halt my-app.production advancement success rate 87.50% < 99%
  Warning  Synced  4m  flagger  Halt my-app.production advancement success rate 82.30% < 99%
  Warning  Synced  5m  flagger  Halt my-app.production advancement success rate 79.10% < 99%
  Warning  Synced  6m  flagger  Rolling back my-app.production failed checks threshold reached 3
  Warning  Synced  7m  flagger  Canary failed! Scaling down my-app.production

All traffic returns to the stable primary. No human intervention needed.


Custom Prometheus Metrics

Define custom metric checks based on your application's Prometheus data:

yaml
analysis:
  metrics:
    # Built-in success rate metric
    - name: request-success-rate
      thresholdRange:
        min: 99
      interval: 1m
 
    # Custom metric: database query latency
    - name: db-query-latency
      templateRef:
        name: db-latency
        namespace: flagger-system
      thresholdRange:
        max: 100    # max 100ms P99 DB query time
      interval: 1m

Create the metric template:

yaml
apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
  name: db-latency
  namespace: flagger-system
spec:
  provider:
    type: prometheus
    address: http://prometheus.monitoring.svc.cluster.local:9090
  query: |
    histogram_quantile(0.99,
      sum(
        rate(
          db_query_duration_seconds_bucket{
            app="{{ target }}",
            namespace="{{ namespace }}"
          }[{{ interval }}]
        )
      ) by (le)
    ) * 1000

Integration with ArgoCD (GitOps)

If you're using ArgoCD, Flagger integrates naturally. Your Git repo contains:

k8s/
  production/
    deployment.yaml
    service.yaml
    ingress.yaml
    canary.yaml       # Flagger Canary resource

ArgoCD syncs these to the cluster. When your CI pipeline builds a new image and updates deployment.yaml with the new tag, ArgoCD syncs the change. Flagger detects the image update and runs the canary analysis automatically.

No manual steps. No human traffic management. Full GitOps with automated progressive delivery.


Slack Notifications

Add webhook notifications to track every canary event in Slack:

yaml
analysis:
  webhooks:
    - name: slack-notify
      type: slack
      url: https://hooks.slack.com/services/YOUR/WEBHOOK/URL
      metadata:
        channel: "#deployments"

You'll get Slack messages for:

  • Canary started (new revision detected)
  • Each traffic weight increase
  • Successful promotion
  • Rollback triggered

Quick Reference

bash
# Watch canary status
kubectl get canary -n production -w
 
# See events
kubectl describe canary my-app -n production
 
# Manually promote (skip analysis)
kubectl annotate canary my-app flagger.app/promote=true -n production
 
# Manually rollback
kubectl annotate canary my-app flagger.app/rollback=true -n production
 
# Delete a canary (restores original deployment)
kubectl delete canary my-app -n production

Learn More

Want to learn progressive delivery, GitOps, and Kubernetes deployment patterns with hands-on labs? KodeKloud's Kubernetes courses cover Flagger, ArgoCD, and the full GitOps workflow with real cluster environments.


Summary

Flagger gives you production-grade canary deployments without writing a single traffic management script:

  1. Install Flagger with Nginx Ingress + Prometheus
  2. Deploy your app with standard K8s Deployment + Service + Ingress
  3. Create a Canary resource defining analysis interval, traffic step, and metric thresholds
  4. Push a new image — Flagger handles the rest
  5. Auto-promotion if metrics stay healthy, auto-rollback if they degrade

The result: every production deployment is automatically a canary deployment. Bad code gets caught before it reaches 100% of users. Good code gets promoted without human intervention.

That's how elite teams deploy confidently at high frequency.

Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments