AWS ALB 504 Gateway Timeout — Every Cause and Fix (2026)
Your ALB returns 504 Gateway Timeout but the app seems fine. Here's every reason this happens — backend timeouts, keepalive mismatches, health check failures — and exactly how to fix each one.
Your Application Load Balancer returns 504 Gateway Timeout to clients. Your EC2 or ECS container looks healthy. Logs show requests arriving — but no response. Users are angry.
Here's every reason this happens and exactly how to fix it.
What 504 Means at the ALB Layer
An ALB 504 means the load balancer forwarded the request to a target but didn't receive a response within the timeout window. The ALB gave up waiting.
This is different from:
- 502 — target returned an invalid/empty response
- 503 — no healthy targets registered
- 504 — target is alive but too slow to respond
The ALB has an idle timeout (default: 60 seconds). If your backend doesn't respond within that window, you get 504.
Case 1: Backend Processing Time Exceeds ALB Idle Timeout
The most common cause. Your app takes longer than 60 seconds to process a request (large file upload, slow DB query, heavy computation).
Check it:
# Check current ALB idle timeout
aws elbv2 describe-load-balancer-attributes \
--load-balancer-arn arn:aws:elasticloadbalancing:region:account:loadbalancer/app/my-alb/xxx \
--query 'Attributes[?Key==`idle_timeout.timeout_seconds`]'Fix — increase ALB idle timeout:
aws elbv2 modify-load-balancer-attributes \
--load-balancer-arn arn:aws:elasticloadbalancing:region:account:loadbalancer/app/my-alb/xxx \
--attributes Key=idle_timeout.timeout_seconds,Value=120Or in Terraform:
resource "aws_lb" "main" {
name = "my-alb"
internal = false
load_balancer_type = "application"
idle_timeout = 120 # increase from default 60
# ... other config
}Also fix your backend — don't just raise the timeout as a band-aid. Optimize slow queries, add pagination, use async processing for heavy jobs.
Case 2: Target Group Health Check Failing Silently
Targets appear registered but fail health checks. ALB routes requests to them anyway during the draining window — then times out.
Check it:
# Check health of all targets in target group
aws elbv2 describe-target-health \
--target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/my-tg/xxxLook for State: unhealthy or State: draining. Also check:
# What does the health check look like?
aws elbv2 describe-target-groups \
--target-group-arns arn:aws:elasticloadbalancing:region:account:targetgroup/my-tg/xxx \
--query 'TargetGroups[0].{Path:HealthCheckPath,Port:HealthCheckPort,Protocol:HealthCheckProtocol,Threshold:HealthyThresholdCount}'Common issues:
- Health check path returns 404 (app doesn't have
/healthendpoint) - Health check hits wrong port
- App not ready at startup but ALB sends traffic immediately
Fix:
resource "aws_lb_target_group" "app" {
name = "app-tg"
port = 8080
protocol = "HTTP"
vpc_id = var.vpc_id
health_check {
enabled = true
path = "/health" # must return 200
port = "traffic-port"
protocol = "HTTP"
healthy_threshold = 2
unhealthy_threshold = 3
timeout = 5
interval = 10
matcher = "200"
}
}Case 3: Keepalive Mismatch Between ALB and Backend
This is a subtle one. ALB keeps HTTP connections alive to reuse them. If your backend closes connections faster than the ALB expects, the ALB sends a request on a dead connection and waits for a response that never comes.
The rule: your backend's keepalive timeout must be longer than ALB's idle timeout.
If ALB idle timeout = 60s but nginx keepalive = 30s, nginx closes the connection at 30s. ALB tries to reuse it at second 45 — gets nothing back — returns 504.
Fix for Nginx:
# nginx.conf
http {
keepalive_timeout 75s; # must be > ALB idle timeout (60s)
keepalive_requests 1000;
}Fix for Node.js:
const server = app.listen(8080);
server.keepAliveTimeout = 75000; // 75 seconds in ms
server.headersTimeout = 76000; // must be > keepAliveTimeoutFix for Python (gunicorn):
gunicorn app:app \
--bind 0.0.0.0:8080 \
--keepalive 75 \
--timeout 120Case 4: Security Group Blocking Return Traffic
Your app receives the request but the response packet is blocked by a security group rule. ALB never gets the response.
Check it:
# Check security group on targets — must allow traffic FROM ALB SG
aws ec2 describe-security-groups --group-ids sg-xxxxxx \
--query 'SecurityGroups[0].IpPermissions'The target security group must allow inbound from the ALB security group on the app port — not just a CIDR range.
Correct pattern:
# ALB security group
resource "aws_security_group" "alb" {
name = "alb-sg"
vpc_id = var.vpc_id
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
# App/EC2 security group
resource "aws_security_group" "app" {
name = "app-sg"
vpc_id = var.vpc_id
ingress {
from_port = 8080
to_port = 8080
protocol = "tcp"
security_groups = [aws_security_group.alb.id] # ALB SG, not CIDR
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}Case 5: Target Group Deregistration Delay Too High
When deploying new versions (rolling update), old targets enter draining state. During draining, ALB still sends requests to them. If your app has already shut down but the deregistration delay is 300s (default), ALB keeps trying for up to 5 minutes — getting timeouts.
Check it:
aws elbv2 describe-target-group-attributes \
--target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/my-tg/xxx \
--query 'Attributes[?Key==`deregistration_delay.timeout_seconds`]'Fix:
resource "aws_lb_target_group" "app" {
# ...
deregistration_delay = 30 # reduce from default 300s if app shuts down faster
}Also add graceful shutdown in your app — handle SIGTERM, finish in-flight requests, then exit cleanly.
Case 6: ECS Task Stopping Mid-Request
In ECS, if a task is stopped (scale-in, deployment, OOM kill) while handling a request, ALB gets a broken connection — 504.
Fix — ECS task graceful shutdown:
In your task_definition, add a stopTimeout:
resource "aws_ecs_task_definition" "app" {
family = "app"
container_definitions = jsonencode([{
name = "app"
image = "myapp:v1.0"
portMappings = [{
containerPort = 8080
protocol = "tcp"
}]
stopTimeout = 30 # give container 30s to finish requests before SIGKILL
}])
}Your app should also handle SIGTERM:
import signal
import sys
def graceful_shutdown(signum, frame):
print("Received SIGTERM, finishing in-flight requests...")
# stop accepting new requests
# wait for active requests to complete
sys.exit(0)
signal.signal(signal.SIGTERM, graceful_shutdown)Quick Diagnosis Checklist
When you see ALB 504, check in this order:
| Check | Command |
|---|---|
| ALB idle timeout | aws elbv2 describe-load-balancer-attributes |
| Target health | aws elbv2 describe-target-health |
| Backend response time | CloudWatch → ALB → TargetResponseTime metric |
| Security group rules | Check app SG allows ALB SG |
| Backend keepalive config | Check nginx/node/gunicorn settings |
| Deregistration delay | Check target group attributes |
Reading ALB Access Logs
Enable access logs to see the actual error details:
resource "aws_lb" "main" {
access_logs {
bucket = aws_s3_bucket.alb_logs.bucket
prefix = "alb"
enabled = true
}
}In the logs, 504s appear as:
- - - [timestamp] "GET /api/slow HTTP/1.1" 504 - "-" "-" "0.000" "-" "-"
The TargetProcessingTime field tells you exactly how long the backend took before ALB gave up.
Summary
| Cause | Fix |
|---|---|
| Backend too slow | Increase ALB idle timeout + optimize app |
| Unhealthy targets | Fix health check path/port |
| Keepalive mismatch | Set backend keepalive > ALB idle timeout |
| Security group blocking | Allow ALB SG → App SG |
| High deregistration delay | Reduce to 30s + add graceful shutdown |
| ECS task stopping mid-request | Set stopTimeout + handle SIGTERM |
ALB 504s almost always come down to one of these six causes. Work through the checklist, check CloudWatch metrics for TargetResponseTime, and you'll find the root cause within 10 minutes.
Want to go deeper? Check our AWS VPC Networking Complete Guide and AWS CloudWatch Monitoring Guide.
Stay ahead of the curve
Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.
Related Articles
AWS ALB Showing Unhealthy Targets — How to Fix It
Fix AWS Application Load Balancer unhealthy targets. Covers health check misconfigurations, security group issues, target group problems, and EKS-specific ALB controller debugging.
AWS ECS Task Keeps Stopping — How to Fix It (2026)
Your ECS task starts and then immediately stops or keeps restarting. Here's every reason this happens and how to debug and fix it.
AWS EKS Pods Stuck in Pending State: Causes and Fixes
Pods stuck in Pending on EKS are caused by a handful of known issues — insufficient node capacity, taint mismatches, PVC problems, and more. Here's how to diagnose and fix each one.