All Articles

KEDA Complete Guide: Event-Driven Autoscaling for Kubernetes in 2026

KEDA lets you scale Kubernetes workloads based on Kafka lag, SQS queue depth, Redis lists, HTTP traffic, and 60+ other event sources. This guide covers everything from installation to production patterns.

DevOpsBoysMar 11, 20268 min read
Share:Tweet

The Horizontal Pod Autoscaler (HPA) that ships with Kubernetes can scale based on CPU and memory. That's useful — but most real production workloads don't scale on CPU alone.

You want to scale your order processing service based on how many unprocessed messages are sitting in your Kafka topic. You want to scale your email sender based on SQS queue depth. You want to scale an ML inference service based on HTTP request rate. You want to scale a data pipeline to zero when there's nothing to process and back up to 50 replicas when work arrives.

That's what KEDA (Kubernetes Event-Driven Autoscaler) is for.

KEDA is now a CNCF graduated project, runs in production at thousands of companies, and supports over 60 event sources out of the box. This guide covers everything you need to go from installation to production-grade event-driven scaling.


What KEDA Does (and How It Works)

KEDA extends Kubernetes with two things:

1. An event source agent — A component that connects to your external system (Kafka, SQS, Redis, etc.) and reads the current "scale metric" — how many messages are waiting, how deep the queue is, what the HTTP request rate is.

2. A controller — When the metric exceeds your threshold, the controller updates the replica count on your Deployment or Job. When the metric drops to zero, it can scale your workload all the way to zero (which regular HPA cannot do).

Under the hood, KEDA creates a standard Kubernetes HPA and feeds it external metrics. This means it's compatible with all existing Kubernetes tooling — you can still use kubectl get hpa to see what's happening.

The core resources KEDA introduces are:

  • ScaledObject — For scaling Deployments/StatefulSets based on a metric
  • ScaledJob — For running Kubernetes Jobs in response to events (process N items, then terminate)
  • TriggerAuthentication — Stores credentials for connecting to event sources

Installation

KEDA is installed via Helm (recommended) or YAML manifests.

bash
# Add the KEDA Helm repo
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
 
# Install KEDA into its own namespace
helm install keda kedacore/keda \
  --namespace keda \
  --create-namespace \
  --version 2.13.0

After installation, verify the components are running:

bash
kubectl get pods -n keda
# NAME                                      READY   STATUS    RESTARTS
# keda-operator-xxx                         1/1     Running   0
# keda-operator-metrics-apiserver-xxx       1/1     Running   0
# keda-admission-webhooks-xxx               1/1     Running   0

KEDA runs three components:

  • keda-operator — Watches ScaledObjects and updates replica counts
  • keda-operator-metrics-apiserver — Exposes external metrics to the Kubernetes metrics API
  • keda-admission-webhooks — Validates KEDA resources on creation

Core Concept: ScaledObject

A ScaledObject binds a deployment to a trigger (event source). Here's the basic structure:

yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: my-app-scaler
  namespace: default
spec:
  scaleTargetRef:
    name: my-app          # The Deployment to scale
  minReplicaCount: 1      # Minimum replicas (use 0 to scale to zero)
  maxReplicaCount: 50     # Maximum replicas
  pollingInterval: 15     # Check trigger every N seconds
  cooldownPeriod: 300     # Wait N seconds before scaling down
  triggers:
    - type: <trigger-type>
      metadata:
        <trigger-specific-config>

The triggers section is where you define your event source. Let's look at the most important ones.


Trigger 1: Kafka (Scale on Consumer Group Lag)

This is one of the most common KEDA use cases. Scale your consumer service based on how far behind it is in processing messages.

yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-consumer-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: order-processor    # Your Kafka consumer deployment
  minReplicaCount: 1
  maxReplicaCount: 30
  triggers:
    - type: kafka
      metadata:
        bootstrapServers: "kafka-broker:9092"
        consumerGroup: "order-processor-group"
        topic: "orders"
        lagThreshold: "100"        # Scale up when lag > 100 messages per partition
        offsetResetPolicy: latest

What this does: For every 100 messages of lag, KEDA adds one replica. If the orders topic has 3 partitions and each has 500 messages of lag (total lag 1500), KEDA will target 15 replicas (1500 / 100 = 15).

For authentication (SASL/TLS):

yaml
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: kafka-auth
  namespace: production
spec:
  secretTargetRef:
    - parameter: sasl
      name: kafka-secret
      key: sasl
    - parameter: username
      name: kafka-secret
      key: username
    - parameter: password
      name: kafka-secret
      key: password

Then reference it in your ScaledObject:

yaml
triggers:
  - type: kafka
    authenticationRef:
      name: kafka-auth
    metadata:
      bootstrapServers: "kafka-broker:9092"
      # ...

Trigger 2: AWS SQS Queue

Scale based on the number of messages in an SQS queue:

yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: sqs-worker-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: email-sender
  minReplicaCount: 0          # Scale to zero when queue is empty
  maxReplicaCount: 20
  cooldownPeriod: 60
  triggers:
    - type: aws-sqs-queue
      authenticationRef:
        name: aws-auth
      metadata:
        queueURL: "https://sqs.us-east-1.amazonaws.com/123456789/email-queue"
        queueLength: "5"       # Target: 5 messages per replica
        awsRegion: "us-east-1"
        scaleOnInFlight: "true"  # Include in-flight messages in count

TriggerAuthentication for AWS (using IAM role with Pod Identity is best practice):

yaml
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: aws-auth
spec:
  podIdentity:
    provider: aws-eks    # Uses IRSA (IAM Roles for Service Accounts)

Or with access keys (less secure, use IRSA in production):

yaml
apiVersion: v1
kind: Secret
metadata:
  name: aws-credentials
stringData:
  AWS_ACCESS_KEY_ID: "your-key"
  AWS_SECRET_ACCESS_KEY: "your-secret"
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: aws-auth
spec:
  secretTargetRef:
    - parameter: awsAccessKeyID
      name: aws-credentials
      key: AWS_ACCESS_KEY_ID
    - parameter: awsSecretAccessKey
      name: aws-credentials
      key: AWS_SECRET_ACCESS_KEY

Trigger 3: Redis List (Scale on Queue Length)

yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: redis-worker-scaler
spec:
  scaleTargetRef:
    name: background-worker
  minReplicaCount: 0
  maxReplicaCount: 15
  triggers:
    - type: redis
      authenticationRef:
        name: redis-auth
      metadata:
        address: "redis:6379"
        listName: "job-queue"      # Redis list to monitor
        listLength: "10"           # Replicas = list_length / listLength
        enableTLS: "false"

Trigger 4: HTTP (Scale on Request Rate)

KEDA's HTTP scaler (via the keda-add-ons-http addon) is useful for services that should scale based on incoming request rate — including scaling to zero during off-hours.

bash
# Install the HTTP add-on
helm install http-add-on kedacore/keda-add-ons-http \
  --namespace keda
yaml
apiVersion: http.keda.sh/v1alpha1
kind: HTTPScaledObject
metadata:
  name: api-http-scaler
spec:
  hosts:
    - api.example.com
  targetPendingRequests: 100    # Scale up when 100+ requests are pending
  scaleTargetRef:
    deployment: api-service
    service: api-service-svc
    port: 80
  replicas:
    min: 0
    max: 30

Trigger 5: Prometheus Metrics

Scale based on any Prometheus metric — this is the most flexible trigger:

yaml
triggers:
  - type: prometheus
    metadata:
      serverAddress: "http://prometheus.monitoring.svc:9090"
      metricName: http_requests_total
      query: |
        sum(rate(http_requests_total{service="my-api"}[1m]))
      threshold: "100"    # Scale up when requests/sec > 100

This lets you scale on literally any metric you collect — custom business metrics, queue depths from your own systems, latency percentiles, anything.


ScaledJob: Processing Events as Kubernetes Jobs

Sometimes you don't want a long-running Deployment — you want a Job that starts, processes N items, and exits. That's ScaledJob:

yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: report-generator
spec:
  jobTargetRef:
    template:
      spec:
        containers:
          - name: generator
            image: myorg/report-generator:v1.2
            env:
              - name: SQS_QUEUE_URL
                value: "https://sqs.us-east-1.amazonaws.com/..."
        restartPolicy: OnFailure
  maxReplicaCount: 10
  scalingStrategy:
    strategy: "accurate"    # accurate | default | custom
  triggers:
    - type: aws-sqs-queue
      authenticationRef:
        name: aws-auth
      metadata:
        queueURL: "https://sqs.us-east-1.amazonaws.com/123456789/reports"
        queueLength: "1"    # One job per message
        awsRegion: "us-east-1"

KEDA creates one Job per message in the queue. Each Job processes one report request and exits. This pattern is perfect for:

  • Batch data processing
  • Report generation
  • Image/video processing pipelines
  • ML batch inference

Scaling to Zero: The Killer Feature

Regular Kubernetes HPA has a minimum of 1 replica. KEDA can scale to zero.

Why does this matter? For workloads that are idle most of the time — nightly batch jobs, development environments, event-driven consumers — scaling to zero means zero compute cost during idle periods.

yaml
spec:
  minReplicaCount: 0      # ← This is the key
  maxReplicaCount: 20
  cooldownPeriod: 300     # Wait 5 minutes of zero events before scaling to 0

When an event arrives, KEDA scales from 0 to 1 instantly. The cold-start latency is the time it takes Kubernetes to schedule and start the pod — typically 5-30 seconds for most workloads.

For use cases where a few seconds of cold-start latency is acceptable (batch processing, async workers), this delivers significant cost savings.


Production Best Practices

1. Set an appropriate cooldownPeriod Don't scale down too aggressively. A cooldownPeriod of 300 seconds (5 minutes) is a good default for most workloads. Too short leads to thrashing (scale up, scale down, scale up again).

2. Use TriggerAuthentication with Pod Identity (IRSA/Workload Identity) Never hardcode credentials in ScaledObjects. Use AWS IRSA, GCP Workload Identity, or Azure Workload Identity to bind your KEDA scaler to a cloud IAM role.

3. Monitor KEDA's own metrics KEDA exposes Prometheus metrics. Watch for keda_scaler_errors_total to catch authentication or connectivity issues with your event sources.

4. Test scale-to-zero carefully When scaling to zero, make sure your upstream services handle the brief unavailability gracefully. For HTTP workloads, the HTTP add-on provides buffering — requests are held while the pod starts up.

5. Use ScaledJob for batch processing instead of ScaledObject If your workload processes a fixed batch and exits, ScaledJob is more appropriate than a long-running Deployment. It's cheaper and simpler.


KEDA vs HPA: When to Use Each

SituationUse
Scale on CPU/memoryHPA (built-in)
Scale on Kafka lagKEDA
Scale on SQS depthKEDA
Scale to zeroKEDA
Scale on custom Prometheus metricKEDA
Scale on HTTP request rateKEDA + HTTP add-on
Batch job processingKEDA ScaledJob

In practice, most production Kubernetes workloads benefit from KEDA for at least some of their services — especially any async workers or event-driven components.


Learning More

KEDA has excellent official documentation at keda.sh. For hands-on practice with Kubernetes autoscaling and event-driven architectures:

  • KodeKloud — Their Kubernetes courses cover HPA, VPA, and increasingly KEDA in their CKA/CKAD content
  • DigitalOcean Kubernetes — A great place to run a test KEDA cluster without the complexity of AWS/GCP for learning

Event-driven scaling is the natural model for most modern microservices. Your scaling decisions should be driven by your actual workload signal — messages waiting, jobs queued, requests arriving — not a proxy metric like CPU. KEDA makes that straightforward in Kubernetes.

Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments