AWS RDS High CPU from Slow Queries: How to Find and Fix Them

RDS CPU suddenly at 90%+? Learn how to use Performance Insights, pg_stat_statements, EXPLAIN ANALYZE, and connection pooling with RDS Proxy to find and fix slow queries fast.

Your RDS CPU is at 90%. Your app is crawling. The database is the bottleneck — but which query is doing it?

This guide walks through exactly how to diagnose and fix RDS high CPU caused by slow queries, for both PostgreSQL and MySQL.

Step 1: Enable Performance Insights (Free for 7-Day Retention)

Performance Insights is the fastest way to see what is killing your database right now.

Go to RDS Console → Your instance → Modify → Performance Insights → Enable. The free tier gives you 7 days of history and costs nothing for most instance types.

Once enabled, open the Performance Insights dashboard. You will see a chart of DB Load broken into wait events and SQL queries. The top SQL tab shows the exact queries consuming the most database time — sorted by average active sessions.

Look at the top query. If one query accounts for 80% of the load, you found your culprit.

Step 2: Enable pg_stat_statements on PostgreSQL

For deeper query-level stats on PostgreSQL, enable the pg_stat_statements extension:

sql

-- Run as superuser (postgres)
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

Then in your rds.conf parameter group, set:

shared_preload_libraries = pg_stat_statements
pg_stat_statements.track = all

Reboot is required after changing shared_preload_libraries. After that:

sql

-- Top 10 slowest queries by total time
SELECT
  query,
  calls,
  total_exec_time / 1000 AS total_sec,
  mean_exec_time / 1000 AS mean_sec,
  rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

This gives you calls count, mean time per call, and total time — everything you need to prioritize.

Step 3: Enable Slow Query Log on MySQL

For MySQL RDS, enable slow query logging via a parameter group:

slow_query_log = 1
long_query_time = 1
log_output = FILE

Apply to your DB parameter group and reboot. Logs appear in CloudWatch Logs under /aws/rds/instance/<name>/slowquery.

Filter in CloudWatch Insights:

fields @timestamp, @message
| filter @message like /Query_time/
| sort @timestamp desc
| limit 50

Step 4: Use EXPLAIN ANALYZE to Find Missing Indexes

Once you have the slow query, run EXPLAIN ANALYZE on PostgreSQL:

sql

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.id, u.email, o.total
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE o.status = 'pending'
AND o.created_at > NOW() - INTERVAL '7 days';

Warning signs in the output:

Seq Scan on a large table — you need an index
Hash Join with many rows — may need composite index
Rows Removed by Filter: 50000 — the query scans 50k rows to return 10

On MySQL:

sql

EXPLAIN SELECT u.id, u.email, o.total
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE o.status = 'pending';

Look for type: ALL — that is a full table scan. You need an index.

Step 5: Add Indexes Without Locking (PostgreSQL)

The safe way to add an index on a live production table in PostgreSQL:

sql

-- This does NOT lock the table for reads or writes
CREATE INDEX CONCURRENTLY idx_orders_status_created
ON orders (status, created_at DESC)
WHERE status = 'pending';

CONCURRENTLY builds the index in the background. It takes longer but does not block your application. Never run a plain CREATE INDEX on a busy production table — it will lock writes for the entire duration.

On MySQL, use ALTER TABLE ... ALGORITHM=INPLACE, LOCK=NONE:

sql

ALTER TABLE orders
ADD INDEX idx_status_created (status, created_at),
ALGORITHM=INPLACE,
LOCK=NONE;

Step 6: Add Connection Pooling with RDS Proxy

High CPU is sometimes not slow queries — it is too many connections. Each RDS connection consumes memory and CPU. If your Lambda functions or pods each open their own connection, you can hit hundreds of connections instantly.

Enable RDS Proxy:

Go to RDS Console → Proxies → Create proxy
Select your database, IAM role, and Secrets Manager secret (stores DB credentials)
Update your app's connection string to point to the Proxy endpoint instead of the RDS endpoint

RDS Proxy multiplexes thousands of application connections into a small pool of actual database connections. This alone can drop CPU by 20-30% on apps with many short-lived connections.

Step 7: Scale vs Optimize — When to Do What

Situation	Action
One bad query identified	Fix the query first, do not scale
Missing index found	Add index with CONCURRENTLY
CPU high even after index	Check connection count, add RDS Proxy
Query is optimal, data just grew	Partition the table or archive old data
Read traffic is the problem	Add a Read Replica and route SELECT queries there
Nothing works, instance is undersized	Scale up instance class (db.t3 → db.r6g)

Scaling an RDS instance takes 3-5 minutes with a brief failover. It does not require downtime if you have Multi-AZ enabled. But always optimize before scaling — a bad query on a bigger instance is still a bad query.

Quick Reference Commands

bash

# Check current connections on PostgreSQL
SELECT count(*) FROM pg_stat_activity;
 
# Kill a long-running query (PostgreSQL)
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE query_start < NOW() - INTERVAL '5 minutes'
AND state = 'active'
AND pid <> pg_backend_pid();
 
# Check table size vs index size
SELECT
  relname AS table,
  pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
  pg_size_pretty(pg_indexes_size(relid)) AS index_size
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC
LIMIT 10;

Performance Insights + EXPLAIN ANALYZE + CREATE INDEX CONCURRENTLY will solve 90% of RDS CPU problems. Start there before touching your instance size.

AWS RDS High CPU from Slow Queries: How to Find and Fix Them

Step 1: Enable Performance Insights (Free for 7-Day Retention)

Step 2: Enable pg_stat_statements on PostgreSQL

Step 3: Enable Slow Query Log on MySQL

Step 4: Use EXPLAIN ANALYZE to Find Missing Indexes

Step 5: Add Indexes Without Locking (PostgreSQL)

Step 6: Add Connection Pooling with RDS Proxy

Step 7: Scale vs Optimize — When to Do What

Quick Reference Commands

Stay ahead of the curve

Related Articles

AWS ALB 504 Gateway Timeout — Every Cause and Fix (2026)

AWS ALB Target Group Unhealthy — Every Cause and Fix

AWS ALB Showing Unhealthy Targets — How to Fix It

Comments