Redis 'max number of clients reached' Error Fix
Fix the ERR max number of clients reached error in Redis. Learn how to diagnose connection pool exhaustion, tune maxclients, fix connection leaks in Python and Node.js, and right-size your pool.
Your app starts throwing ERR max number of clients reached at 2 AM and pods begin crashing. This is one of the most common Redis production errors — and almost always caused by connection pool misconfiguration, not Redis itself. Here is how to diagnose and fix it permanently.
Step 1: Check How Many Clients Are Connected
SSH into your app server or run from any node that can reach Redis:
redis-cli -h <redis-host> -p 6379 INFO clientsSample output:
# Clients
connected_clients:512
cluster_connections:0
maxclients:512
client_recent_max_input_buffer:20480
blocked_clients:0
tracking_clients:0
If connected_clients equals maxclients, you have hit the ceiling. Redis is refusing new connections.
Also check:
redis-cli -h <redis-host> INFO stats | grep rejected_connectionsIf rejected_connections is non-zero, connections are actively being dropped.
Step 2: Understand Why Pools Get Exhausted
Connection exhaustion happens for three reasons:
1. No connection pooling at all — every request opens a new connection and never closes it cleanly. This is the most common bug in Python apps that instantiate Redis() inside a function.
2. Pool size too small for instance count — if you run 20 pods each with a pool of 30, that is 600 potential connections. Redis default maxclients is 10,000 but many managed Redis tiers (ElastiCache, Upstash) set it to 65,536 or lower.
3. Connection leaks — code opens a connection, hits an exception, never returns it to the pool. Over hours the pool fills up.
Step 3: Fix in Python (redis-py)
Wrong — creates a new connection per call:
import redis
def get_user(user_id):
r = redis.Redis(host="redis", port=6379) # BAD: new connection every call
return r.get(f"user:{user_id}")Correct — shared connection pool:
import redis
# Create pool once at module level
pool = redis.ConnectionPool(
host="redis",
port=6379,
db=0,
max_connections=20, # tune per pod
socket_timeout=5,
socket_connect_timeout=5,
retry_on_timeout=True,
)
def get_redis():
return redis.Redis(connection_pool=pool)
def get_user(user_id):
r = get_redis()
return r.get(f"user:{user_id}")The pool is a module-level singleton. All calls reuse it. Connections are returned automatically after each command.
Step 4: Fix in Node.js (ioredis)
Wrong:
const Redis = require("ioredis");
async function getValue(key) {
const client = new Redis({ host: "redis", port: 6379 }); // BAD
return client.get(key);
}Correct:
const Redis = require("ioredis");
const redis = new Redis({
host: "redis",
port: 6379,
maxRetriesPerRequest: 3,
enableOfflineQueue: false,
connectTimeout: 5000,
lazyConnect: true,
});
async function getValue(key) {
return redis.get(key);
}For ioredis clusters, use Redis.Cluster and set redisOptions.maxRetriesPerRequest to avoid infinite retry storms.
Step 5: Temporarily Raise maxclients (Not a Fix, But Buys Time)
redis-cli CONFIG SET maxclients 2000This survives until restart. To make it permanent, edit redis.conf:
maxclients 2000
In Kubernetes, if you are running Redis via Helm (Bitnami):
# values.yaml
master:
extraFlags:
- "--maxclients 2000"Step 6: Find Connection Leaks
Run this every 30 seconds during an incident:
redis-cli CLIENT LIST | awk -F'[= ]' '{for(i=1;i<=NF;i++) if($i=="cmd") print $(i+1)}' | sort | uniq -c | sort -rn | head -20This shows which commands are holding connections. If you see hundreds of subscribe or blpop commands that never close, you have a leak in your pub/sub or blocking pop code.
Also check idle time:
redis-cli CLIENT LIST | grep "idle=[0-9][0-9][0-9][0-9]"Connections idle for thousands of seconds are leaked. Kill them:
redis-cli CLIENT KILL ID <client-id>Pool Sizing Formula
max_connections_per_pod = (peak_requests_per_second / avg_redis_calls_per_request) * avg_redis_latency_ms / 1000
total_connections = max_connections_per_pod * pod_count
Example: 500 RPS per pod, 3 Redis calls per request, 2ms average Redis latency:
= (500 / 3) * (2 / 1000)
= 167 * 0.002
= 0.33 connections active at any time
With safety factor of 10x and burst headroom: pool of 5 per pod is plenty. Most teams over-provision.
Step 7: Use a Proxy for Many Clients
If you have 100+ pods all hitting Redis, a proxy layer reduces connection count dramatically.
Twemproxy (nutcracker):
# nutcracker.yml
redis_pool:
listen: 0.0.0.0:6380
redis: true
servers:
- redis:6379:1
pool_size: 50
backlog: 1024KeyDB is a drop-in Redis replacement that handles more concurrent connections and supports active replication.
In Kubernetes, run Twemproxy as a sidecar or as a separate Deployment with a ClusterIP service. All pods connect to redis-proxy:6380 instead of Redis directly.
Quick Checklist
- Run
INFO clients— isconnected_clientsnearmaxclients? - Is your Redis client a singleton or created per-request?
- Is
max_connectionsin your pool configured explicitly? - Run
CLIENT LIST | grep idle— are there leaked idle connections? - Is
socket_timeoutset? Without it, broken connections hang forever. - Are you using a proxy if pod count > 50?
Connection pool exhaustion is almost always a code bug, not a Redis capacity problem. Fix the pooling pattern, set explicit limits, and add a socket_timeout — that alone resolves 90% of these incidents.
Affiliate Tools
Managing Redis at scale? Upstash offers serverless Redis with per-request pricing — no connection limits to worry about. For self-hosted Redis monitoring, Grafana Cloud has a Redis dashboard out of the box.
Today I Fixed
Short real fixes from production — posted daily
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
Grafana Dashboard Panels Not Loading or Showing No Data Fix
Fix Grafana panels stuck on 'No data' or spinning forever. Covers datasource issues, time range mismatches, variable resolution failures, Prometheus scrape interval mismatches, and broken panel JSON.
Kubernetes HPA Not Scaling on Custom Metrics Fix
HPA scales on CPU but ignores your Prometheus or SQS custom metrics? Learn how the custom metrics adapter works, fix common errors, and use KEDA as a drop-in alternative.
Prometheus Alerts Not Firing: Every Cause and Fix
Your Prometheus alert should have fired 30 minutes ago but nothing happened. Here's every reason alerts silently fail — routing, inhibition, receivers, and rule syntax.