What is Rate Limiting? A Clear Explanation for DevOps Engineers
Rate limiting protects your APIs and infrastructure from overload and abuse. Here's what it is, how it works, and how to implement it in Nginx, Kubernetes, and code.
Rate limiting is one of the most fundamental protection mechanisms for web services. Without it, a single misbehaving client can take down your API for everyone.
What is Rate Limiting?
Rate limiting restricts how many requests a client can make in a given time window.
Client makes 1000 requests/second
Rate limit: 100 requests/second per IP
→ First 100: allowed
→ Requests 101-1000: rejected with HTTP 429 (Too Many Requests)
It protects against:
- DDoS / Brute force attacks — automated tools sending thousands of requests
- API abuse — scrapers hitting your API without restraint
- Runaway clients — a bug in a client causing infinite retry loops
- Uneven load — one customer consuming all capacity
Rate Limiting Algorithms
Token Bucket (Most Common)
Imagine a bucket that fills with tokens at a fixed rate. Each request costs one token. If the bucket is empty, the request is rejected.
Bucket capacity: 100 tokens
Fill rate: 10 tokens/second
Client sends 100 requests instantly → all allowed (uses all 100 tokens)
Client sends 1 more request → rejected (bucket empty)
10 seconds later → 100 tokens refilled, client can make 100 more requests
This allows short bursts (emptying the bucket) while maintaining a steady average rate.
Fixed Window Counter
Count requests in a fixed time window (e.g., per minute):
Window: 0-60 seconds → limit 100 requests
Window: 60-120 seconds → limit 100 requests (counter resets)
Simple but has a problem: a client can make 100 requests at 59 seconds and 100 more at 61 seconds — effectively 200 in 2 seconds.
Sliding Window
Counts requests in the last N seconds, not in a fixed window. More accurate:
Request at 14:00:59 → count requests from 14:00:00 to 14:00:59 → 95 (allowed)
Request at 14:01:01 → count requests from 14:00:02 to 14:01:01 → 98 (allowed)
Rate Limiting in Nginx Ingress
# nginx.conf
http {
# Define a zone: 10MB storage for IPs, 10 req/second limit
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
server {
location /api/ {
# Allow burst of 20 requests, then enforce rate
limit_req zone=api burst=20 nodelay;
limit_req_status 429;
proxy_pass http://backend;
}
}
}On Kubernetes with Nginx Ingress Controller:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-api
annotations:
nginx.ingress.kubernetes.io/limit-rps: "10" # requests per second
nginx.ingress.kubernetes.io/limit-rpm: "300" # requests per minute
nginx.ingress.kubernetes.io/limit-burst-multiplier: "3" # allow burst 3x
nginx.ingress.kubernetes.io/limit-connections: "20" # concurrent connections
spec:
rules:
- host: api.myapp.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-api
port:
number: 80Rate Limiting in Traefik
# Traefik middleware
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: rate-limit
namespace: default
spec:
rateLimit:
average: 100 # requests per second (average)
burst: 200 # max burst
period: 1s # measurement period
sourceCriterion:
ipStrategy:
depth: 1 # use real client IP (behind load balancer)
---
# Apply to IngressRoute
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: my-api
spec:
routes:
- match: Host(`api.myapp.com`)
middlewares:
- name: rate-limit
services:
- name: my-api
port: 80Rate Limiting in FastAPI / Python
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import JSONResponse
import time
import redis
from collections import defaultdict
app = FastAPI()
# Simple in-memory rate limiter (for single instance)
class InMemoryRateLimiter:
def __init__(self, requests_per_minute: int = 60):
self.requests_per_minute = requests_per_minute
self.requests = defaultdict(list)
def is_allowed(self, client_id: str) -> bool:
now = time.time()
window_start = now - 60 # 1 minute window
# Clean up old requests
self.requests[client_id] = [
req_time for req_time in self.requests[client_id]
if req_time > window_start
]
if len(self.requests[client_id]) >= self.requests_per_minute:
return False
self.requests[client_id].append(now)
return True
# Redis-backed rate limiter (for multi-instance)
class RedisRateLimiter:
def __init__(self, redis_client: redis.Redis, requests_per_minute: int = 60):
self.redis = redis_client
self.limit = requests_per_minute
def is_allowed(self, client_id: str) -> tuple[bool, int]:
key = f"rate_limit:{client_id}"
pipe = self.redis.pipeline()
now = int(time.time())
window_start = now - 60
pipe.zremrangebyscore(key, 0, window_start)
pipe.zcard(key)
pipe.zadd(key, {str(now): now})
pipe.expire(key, 61)
results = pipe.execute()
current_count = results[1]
return current_count < self.limit, self.limit - current_count
limiter = RedisRateLimiter(redis.Redis(host="redis"), requests_per_minute=100)
@app.middleware("http")
async def rate_limit_middleware(request: Request, call_next):
# Get client identifier (use API key or IP)
client_id = request.headers.get("X-API-Key") or request.client.host
allowed, remaining = limiter.is_allowed(client_id)
if not allowed:
return JSONResponse(
status_code=429,
content={"error": "Rate limit exceeded. Try again in 60 seconds."},
headers={
"X-RateLimit-Limit": "100",
"X-RateLimit-Remaining": "0",
"Retry-After": "60",
}
)
response = await call_next(request)
response.headers["X-RateLimit-Limit"] = "100"
response.headers["X-RateLimit-Remaining"] = str(remaining)
return responseRate Limit Headers
Good APIs tell clients about their rate limits:
HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 73
X-RateLimit-Reset: 1750000000 # Unix timestamp when limit resetsWhen rate limited:
HTTP/1.1 429 Too Many Requests
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0Different Limits for Different Users
Good rate limiting is tiered:
RATE_LIMITS = {
"free": 60, # 60 req/min
"pro": 600, # 600 req/min
"enterprise": 6000 # 6000 req/min
}
def get_rate_limit(api_key: str) -> int:
tier = lookup_api_key_tier(api_key)
return RATE_LIMITS.get(tier, 60)Key Points to Remember
- Rate limit by API key, not IP when possible — users behind NAT share an IP
- Return proper 429 status with
Retry-Afterheader so clients know when to retry - Use Redis for rate limiting in multi-instance deployments (in-memory doesn't share)
- Allow bursts — token bucket lets clients batch requests without constant rejection
- Rate limit at the gateway level (Nginx, Traefik) to protect before requests hit your app
Rate limiting is one of those things that seems optional until you get your first DDoS or runaway client. Add it before you need it.
Tools: Nginx rate limiting | Redis | FastAPI
Today I Fixed
Short real fixes from production — posted daily
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
What is a Service Mesh? Explained Simply (No Jargon)
Service mesh sounds complicated but the concept is simple. Here's what it actually does, why teams use it, and whether you need one — explained without the buzzwords.
What is a Kubernetes Network Policy — Explained Simply
By default, all pods in Kubernetes can talk to each other. Network Policies let you control exactly which pods can communicate. Here's how they work with practical examples.
What is mTLS? Mutual TLS Explained Simply (with Kubernetes Examples)
mTLS means both sides of a connection verify each other's identity. It's the backbone of zero-trust networking in Kubernetes service meshes. Here's how it works in plain language.