KEDA Complete Guide: Event-Driven Autoscaling for Kubernetes in 2026
KEDA lets you scale Kubernetes workloads based on Kafka lag, SQS queue depth, Redis lists, HTTP traffic, and 60+ other event sources. This guide covers everything from installation to production patterns.
The Horizontal Pod Autoscaler (HPA) that ships with Kubernetes can scale based on CPU and memory. That's useful — but most real production workloads don't scale on CPU alone.
You want to scale your order processing service based on how many unprocessed messages are sitting in your Kafka topic. You want to scale your email sender based on SQS queue depth. You want to scale an ML inference service based on HTTP request rate. You want to scale a data pipeline to zero when there's nothing to process and back up to 50 replicas when work arrives.
That's what KEDA (Kubernetes Event-Driven Autoscaler) is for.
KEDA is now a CNCF graduated project, runs in production at thousands of companies, and supports over 60 event sources out of the box. This guide covers everything you need to go from installation to production-grade event-driven scaling.
What KEDA Does (and How It Works)
KEDA extends Kubernetes with two things:
1. An event source agent — A component that connects to your external system (Kafka, SQS, Redis, etc.) and reads the current "scale metric" — how many messages are waiting, how deep the queue is, what the HTTP request rate is.
2. A controller — When the metric exceeds your threshold, the controller updates the replica count on your Deployment or Job. When the metric drops to zero, it can scale your workload all the way to zero (which regular HPA cannot do).
Under the hood, KEDA creates a standard Kubernetes HPA and feeds it external metrics. This means it's compatible with all existing Kubernetes tooling — you can still use kubectl get hpa to see what's happening.
The core resources KEDA introduces are:
- ScaledObject — For scaling Deployments/StatefulSets based on a metric
- ScaledJob — For running Kubernetes Jobs in response to events (process N items, then terminate)
- TriggerAuthentication — Stores credentials for connecting to event sources
Installation
KEDA is installed via Helm (recommended) or YAML manifests.
# Add the KEDA Helm repo
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
# Install KEDA into its own namespace
helm install keda kedacore/keda \
--namespace keda \
--create-namespace \
--version 2.13.0After installation, verify the components are running:
kubectl get pods -n keda
# NAME READY STATUS RESTARTS
# keda-operator-xxx 1/1 Running 0
# keda-operator-metrics-apiserver-xxx 1/1 Running 0
# keda-admission-webhooks-xxx 1/1 Running 0KEDA runs three components:
- keda-operator — Watches ScaledObjects and updates replica counts
- keda-operator-metrics-apiserver — Exposes external metrics to the Kubernetes metrics API
- keda-admission-webhooks — Validates KEDA resources on creation
Core Concept: ScaledObject
A ScaledObject binds a deployment to a trigger (event source). Here's the basic structure:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: my-app-scaler
namespace: default
spec:
scaleTargetRef:
name: my-app # The Deployment to scale
minReplicaCount: 1 # Minimum replicas (use 0 to scale to zero)
maxReplicaCount: 50 # Maximum replicas
pollingInterval: 15 # Check trigger every N seconds
cooldownPeriod: 300 # Wait N seconds before scaling down
triggers:
- type: <trigger-type>
metadata:
<trigger-specific-config>The triggers section is where you define your event source. Let's look at the most important ones.
Trigger 1: Kafka (Scale on Consumer Group Lag)
This is one of the most common KEDA use cases. Scale your consumer service based on how far behind it is in processing messages.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kafka-consumer-scaler
namespace: production
spec:
scaleTargetRef:
name: order-processor # Your Kafka consumer deployment
minReplicaCount: 1
maxReplicaCount: 30
triggers:
- type: kafka
metadata:
bootstrapServers: "kafka-broker:9092"
consumerGroup: "order-processor-group"
topic: "orders"
lagThreshold: "100" # Scale up when lag > 100 messages per partition
offsetResetPolicy: latestWhat this does: For every 100 messages of lag, KEDA adds one replica. If the orders topic has 3 partitions and each has 500 messages of lag (total lag 1500), KEDA will target 15 replicas (1500 / 100 = 15).
For authentication (SASL/TLS):
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: kafka-auth
namespace: production
spec:
secretTargetRef:
- parameter: sasl
name: kafka-secret
key: sasl
- parameter: username
name: kafka-secret
key: username
- parameter: password
name: kafka-secret
key: passwordThen reference it in your ScaledObject:
triggers:
- type: kafka
authenticationRef:
name: kafka-auth
metadata:
bootstrapServers: "kafka-broker:9092"
# ...Trigger 2: AWS SQS Queue
Scale based on the number of messages in an SQS queue:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: sqs-worker-scaler
namespace: production
spec:
scaleTargetRef:
name: email-sender
minReplicaCount: 0 # Scale to zero when queue is empty
maxReplicaCount: 20
cooldownPeriod: 60
triggers:
- type: aws-sqs-queue
authenticationRef:
name: aws-auth
metadata:
queueURL: "https://sqs.us-east-1.amazonaws.com/123456789/email-queue"
queueLength: "5" # Target: 5 messages per replica
awsRegion: "us-east-1"
scaleOnInFlight: "true" # Include in-flight messages in countTriggerAuthentication for AWS (using IAM role with Pod Identity is best practice):
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: aws-auth
spec:
podIdentity:
provider: aws-eks # Uses IRSA (IAM Roles for Service Accounts)Or with access keys (less secure, use IRSA in production):
apiVersion: v1
kind: Secret
metadata:
name: aws-credentials
stringData:
AWS_ACCESS_KEY_ID: "your-key"
AWS_SECRET_ACCESS_KEY: "your-secret"
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: aws-auth
spec:
secretTargetRef:
- parameter: awsAccessKeyID
name: aws-credentials
key: AWS_ACCESS_KEY_ID
- parameter: awsSecretAccessKey
name: aws-credentials
key: AWS_SECRET_ACCESS_KEYTrigger 3: Redis List (Scale on Queue Length)
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: redis-worker-scaler
spec:
scaleTargetRef:
name: background-worker
minReplicaCount: 0
maxReplicaCount: 15
triggers:
- type: redis
authenticationRef:
name: redis-auth
metadata:
address: "redis:6379"
listName: "job-queue" # Redis list to monitor
listLength: "10" # Replicas = list_length / listLength
enableTLS: "false"Trigger 4: HTTP (Scale on Request Rate)
KEDA's HTTP scaler (via the keda-add-ons-http addon) is useful for services that should scale based on incoming request rate — including scaling to zero during off-hours.
# Install the HTTP add-on
helm install http-add-on kedacore/keda-add-ons-http \
--namespace kedaapiVersion: http.keda.sh/v1alpha1
kind: HTTPScaledObject
metadata:
name: api-http-scaler
spec:
hosts:
- api.example.com
targetPendingRequests: 100 # Scale up when 100+ requests are pending
scaleTargetRef:
deployment: api-service
service: api-service-svc
port: 80
replicas:
min: 0
max: 30Trigger 5: Prometheus Metrics
Scale based on any Prometheus metric — this is the most flexible trigger:
triggers:
- type: prometheus
metadata:
serverAddress: "http://prometheus.monitoring.svc:9090"
metricName: http_requests_total
query: |
sum(rate(http_requests_total{service="my-api"}[1m]))
threshold: "100" # Scale up when requests/sec > 100This lets you scale on literally any metric you collect — custom business metrics, queue depths from your own systems, latency percentiles, anything.
ScaledJob: Processing Events as Kubernetes Jobs
Sometimes you don't want a long-running Deployment — you want a Job that starts, processes N items, and exits. That's ScaledJob:
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: report-generator
spec:
jobTargetRef:
template:
spec:
containers:
- name: generator
image: myorg/report-generator:v1.2
env:
- name: SQS_QUEUE_URL
value: "https://sqs.us-east-1.amazonaws.com/..."
restartPolicy: OnFailure
maxReplicaCount: 10
scalingStrategy:
strategy: "accurate" # accurate | default | custom
triggers:
- type: aws-sqs-queue
authenticationRef:
name: aws-auth
metadata:
queueURL: "https://sqs.us-east-1.amazonaws.com/123456789/reports"
queueLength: "1" # One job per message
awsRegion: "us-east-1"KEDA creates one Job per message in the queue. Each Job processes one report request and exits. This pattern is perfect for:
- Batch data processing
- Report generation
- Image/video processing pipelines
- ML batch inference
Scaling to Zero: The Killer Feature
Regular Kubernetes HPA has a minimum of 1 replica. KEDA can scale to zero.
Why does this matter? For workloads that are idle most of the time — nightly batch jobs, development environments, event-driven consumers — scaling to zero means zero compute cost during idle periods.
spec:
minReplicaCount: 0 # ← This is the key
maxReplicaCount: 20
cooldownPeriod: 300 # Wait 5 minutes of zero events before scaling to 0When an event arrives, KEDA scales from 0 to 1 instantly. The cold-start latency is the time it takes Kubernetes to schedule and start the pod — typically 5-30 seconds for most workloads.
For use cases where a few seconds of cold-start latency is acceptable (batch processing, async workers), this delivers significant cost savings.
Production Best Practices
1. Set an appropriate cooldownPeriod
Don't scale down too aggressively. A cooldownPeriod of 300 seconds (5 minutes) is a good default for most workloads. Too short leads to thrashing (scale up, scale down, scale up again).
2. Use TriggerAuthentication with Pod Identity (IRSA/Workload Identity)
Never hardcode credentials in ScaledObjects. Use AWS IRSA, GCP Workload Identity, or Azure Workload Identity to bind your KEDA scaler to a cloud IAM role.
3. Monitor KEDA's own metrics
KEDA exposes Prometheus metrics. Watch for keda_scaler_errors_total to catch authentication or connectivity issues with your event sources.
4. Test scale-to-zero carefully When scaling to zero, make sure your upstream services handle the brief unavailability gracefully. For HTTP workloads, the HTTP add-on provides buffering — requests are held while the pod starts up.
5. Use ScaledJob for batch processing instead of ScaledObject
If your workload processes a fixed batch and exits, ScaledJob is more appropriate than a long-running Deployment. It's cheaper and simpler.
KEDA vs HPA: When to Use Each
| Situation | Use |
|---|---|
| Scale on CPU/memory | HPA (built-in) |
| Scale on Kafka lag | KEDA |
| Scale on SQS depth | KEDA |
| Scale to zero | KEDA |
| Scale on custom Prometheus metric | KEDA |
| Scale on HTTP request rate | KEDA + HTTP add-on |
| Batch job processing | KEDA ScaledJob |
In practice, most production Kubernetes workloads benefit from KEDA for at least some of their services — especially any async workers or event-driven components.
Learning More
KEDA has excellent official documentation at keda.sh. For hands-on practice with Kubernetes autoscaling and event-driven architectures:
- KodeKloud — Their Kubernetes courses cover HPA, VPA, and increasingly KEDA in their CKA/CKAD content
- DigitalOcean Kubernetes — A great place to run a test KEDA cluster without the complexity of AWS/GCP for learning
Event-driven scaling is the natural model for most modern microservices. Your scaling decisions should be driven by your actual workload signal — messages waiting, jobs queued, requests arriving — not a proxy metric like CPU. KEDA makes that straightforward in Kubernetes.
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
AI Agents Will Replace DevOps Bash Scripts — And That's a Good Thing
The future of DevOps automation is not more bash scripts. AI agents that can reason, adapt, and self-correct are quietly making traditional scripting obsolete. Here is what that means for DevOps engineers in 2026 and beyond.
ArgoCD vs Flux vs Jenkins — GitOps Comparison 2026
A deep-dive comparison of the three most popular GitOps and CI/CD tools — ArgoCD, Flux CD, and Jenkins. Learn which one fits your team, use case, and Kubernetes setup.
AWS DevOps Tools — CodePipeline to EKS Complete Overview
A complete guide to AWS DevOps services — CI/CD pipelines, container orchestration, infrastructure as code, monitoring, and security best practices.