AWS RDS High CPU from Slow Queries: How to Find and Fix Them
RDS CPU suddenly at 90%+? Learn how to use Performance Insights, pg_stat_statements, EXPLAIN ANALYZE, and connection pooling with RDS Proxy to find and fix slow queries fast.
Your RDS CPU is at 90%. Your app is crawling. The database is the bottleneck — but which query is doing it?
This guide walks through exactly how to diagnose and fix RDS high CPU caused by slow queries, for both PostgreSQL and MySQL.
Step 1: Enable Performance Insights (Free for 7-Day Retention)
Performance Insights is the fastest way to see what is killing your database right now.
Go to RDS Console → Your instance → Modify → Performance Insights → Enable. The free tier gives you 7 days of history and costs nothing for most instance types.
Once enabled, open the Performance Insights dashboard. You will see a chart of DB Load broken into wait events and SQL queries. The top SQL tab shows the exact queries consuming the most database time — sorted by average active sessions.
Look at the top query. If one query accounts for 80% of the load, you found your culprit.
Step 2: Enable pg_stat_statements on PostgreSQL
For deeper query-level stats on PostgreSQL, enable the pg_stat_statements extension:
-- Run as superuser (postgres)
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;Then in your rds.conf parameter group, set:
shared_preload_libraries = pg_stat_statements
pg_stat_statements.track = all
Reboot is required after changing shared_preload_libraries. After that:
-- Top 10 slowest queries by total time
SELECT
query,
calls,
total_exec_time / 1000 AS total_sec,
mean_exec_time / 1000 AS mean_sec,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;This gives you calls count, mean time per call, and total time — everything you need to prioritize.
Step 3: Enable Slow Query Log on MySQL
For MySQL RDS, enable slow query logging via a parameter group:
slow_query_log = 1
long_query_time = 1
log_output = FILE
Apply to your DB parameter group and reboot. Logs appear in CloudWatch Logs under /aws/rds/instance/<name>/slowquery.
Filter in CloudWatch Insights:
fields @timestamp, @message
| filter @message like /Query_time/
| sort @timestamp desc
| limit 50
Step 4: Use EXPLAIN ANALYZE to Find Missing Indexes
Once you have the slow query, run EXPLAIN ANALYZE on PostgreSQL:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.id, u.email, o.total
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE o.status = 'pending'
AND o.created_at > NOW() - INTERVAL '7 days';Warning signs in the output:
Seq Scanon a large table — you need an indexHash Joinwith many rows — may need composite indexRows Removed by Filter: 50000— the query scans 50k rows to return 10
On MySQL:
EXPLAIN SELECT u.id, u.email, o.total
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE o.status = 'pending';Look for type: ALL — that is a full table scan. You need an index.
Step 5: Add Indexes Without Locking (PostgreSQL)
The safe way to add an index on a live production table in PostgreSQL:
-- This does NOT lock the table for reads or writes
CREATE INDEX CONCURRENTLY idx_orders_status_created
ON orders (status, created_at DESC)
WHERE status = 'pending';CONCURRENTLY builds the index in the background. It takes longer but does not block your application. Never run a plain CREATE INDEX on a busy production table — it will lock writes for the entire duration.
On MySQL, use ALTER TABLE ... ALGORITHM=INPLACE, LOCK=NONE:
ALTER TABLE orders
ADD INDEX idx_status_created (status, created_at),
ALGORITHM=INPLACE,
LOCK=NONE;Step 6: Add Connection Pooling with RDS Proxy
High CPU is sometimes not slow queries — it is too many connections. Each RDS connection consumes memory and CPU. If your Lambda functions or pods each open their own connection, you can hit hundreds of connections instantly.
Enable RDS Proxy:
- Go to RDS Console → Proxies → Create proxy
- Select your database, IAM role, and Secrets Manager secret (stores DB credentials)
- Update your app's connection string to point to the Proxy endpoint instead of the RDS endpoint
RDS Proxy multiplexes thousands of application connections into a small pool of actual database connections. This alone can drop CPU by 20-30% on apps with many short-lived connections.
Step 7: Scale vs Optimize — When to Do What
| Situation | Action |
|---|---|
| One bad query identified | Fix the query first, do not scale |
| Missing index found | Add index with CONCURRENTLY |
| CPU high even after index | Check connection count, add RDS Proxy |
| Query is optimal, data just grew | Partition the table or archive old data |
| Read traffic is the problem | Add a Read Replica and route SELECT queries there |
| Nothing works, instance is undersized | Scale up instance class (db.t3 → db.r6g) |
Scaling an RDS instance takes 3-5 minutes with a brief failover. It does not require downtime if you have Multi-AZ enabled. But always optimize before scaling — a bad query on a bigger instance is still a bad query.
Quick Reference Commands
# Check current connections on PostgreSQL
SELECT count(*) FROM pg_stat_activity;
# Kill a long-running query (PostgreSQL)
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE query_start < NOW() - INTERVAL '5 minutes'
AND state = 'active'
AND pid <> pg_backend_pid();
# Check table size vs index size
SELECT
relname AS table,
pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
pg_size_pretty(pg_indexes_size(relid)) AS index_size
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC
LIMIT 10;Performance Insights + EXPLAIN ANALYZE + CREATE INDEX CONCURRENTLY will solve 90% of RDS CPU problems. Start there before touching your instance size.
Today I Fixed
Short real fixes from production — posted daily
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
AWS ALB 504 Gateway Timeout — Every Cause and Fix (2026)
Your ALB returns 504 Gateway Timeout but the app seems fine. Here's every reason this happens — backend timeouts, keepalive mismatches, health check failures — and exactly how to fix each one.
AWS ALB Target Group Unhealthy — Every Cause and Fix
Your ALB shows targets as unhealthy and traffic isn't reaching your app. Here's every reason target health checks fail and exactly how to fix each one.
AWS ALB Showing Unhealthy Targets — How to Fix It
Fix AWS Application Load Balancer unhealthy targets. Covers health check misconfigurations, security group issues, target group problems, and EKS-specific ALB controller debugging.