AWS ALB 504 Gateway Timeout — Every Cause and Fix (2026)

Your ALB returns 504 Gateway Timeout but the app seems fine. Here's every reason this happens — backend timeouts, keepalive mismatches, health check failures — and exactly how to fix each one.

Your Application Load Balancer returns 504 Gateway Timeout to clients. Your EC2 or ECS container looks healthy. Logs show requests arriving — but no response. Users are angry.

Here's every reason this happens and exactly how to fix it.

What 504 Means at the ALB Layer

An ALB 504 means the load balancer forwarded the request to a target but didn't receive a response within the timeout window. The ALB gave up waiting.

This is different from:

502 — target returned an invalid/empty response
503 — no healthy targets registered
504 — target is alive but too slow to respond

The ALB has an idle timeout (default: 60 seconds). If your backend doesn't respond within that window, you get 504.

Case 1: Backend Processing Time Exceeds ALB Idle Timeout

The most common cause. Your app takes longer than 60 seconds to process a request (large file upload, slow DB query, heavy computation).

Check it:

bash

# Check current ALB idle timeout
aws elbv2 describe-load-balancer-attributes \
  --load-balancer-arn arn:aws:elasticloadbalancing:region:account:loadbalancer/app/my-alb/xxx \
  --query 'Attributes[?Key==`idle_timeout.timeout_seconds`]'

Fix — increase ALB idle timeout:

bash

aws elbv2 modify-load-balancer-attributes \
  --load-balancer-arn arn:aws:elasticloadbalancing:region:account:loadbalancer/app/my-alb/xxx \
  --attributes Key=idle_timeout.timeout_seconds,Value=120

Or in Terraform:

hcl

resource "aws_lb" "main" {
  name               = "my-alb"
  internal           = false
  load_balancer_type = "application"
 
  idle_timeout = 120  # increase from default 60
 
  # ... other config
}

Also fix your backend — don't just raise the timeout as a band-aid. Optimize slow queries, add pagination, use async processing for heavy jobs.

Case 2: Target Group Health Check Failing Silently

Targets appear registered but fail health checks. ALB routes requests to them anyway during the draining window — then times out.

Check it:

bash

# Check health of all targets in target group
aws elbv2 describe-target-health \
  --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/my-tg/xxx

Look for State: unhealthy or State: draining. Also check:

bash

# What does the health check look like?
aws elbv2 describe-target-groups \
  --target-group-arns arn:aws:elasticloadbalancing:region:account:targetgroup/my-tg/xxx \
  --query 'TargetGroups[0].{Path:HealthCheckPath,Port:HealthCheckPort,Protocol:HealthCheckProtocol,Threshold:HealthyThresholdCount}'

Common issues:

Health check path returns 404 (app doesn't have /health endpoint)
Health check hits wrong port
App not ready at startup but ALB sends traffic immediately

Fix:

hcl

resource "aws_lb_target_group" "app" {
  name     = "app-tg"
  port     = 8080
  protocol = "HTTP"
  vpc_id   = var.vpc_id
 
  health_check {
    enabled             = true
    path                = "/health"    # must return 200
    port                = "traffic-port"
    protocol            = "HTTP"
    healthy_threshold   = 2
    unhealthy_threshold = 3
    timeout             = 5
    interval            = 10
    matcher             = "200"
  }
}

Case 3: Keepalive Mismatch Between ALB and Backend

This is a subtle one. ALB keeps HTTP connections alive to reuse them. If your backend closes connections faster than the ALB expects, the ALB sends a request on a dead connection and waits for a response that never comes.

The rule: your backend's keepalive timeout must be longer than ALB's idle timeout.

If ALB idle timeout = 60s but nginx keepalive = 30s, nginx closes the connection at 30s. ALB tries to reuse it at second 45 — gets nothing back — returns 504.

Fix for Nginx:

nginx

# nginx.conf
http {
  keepalive_timeout 75s;  # must be > ALB idle timeout (60s)
  keepalive_requests 1000;
}

Fix for Node.js:

javascript

const server = app.listen(8080);
server.keepAliveTimeout = 75000;  // 75 seconds in ms
server.headersTimeout = 76000;    // must be > keepAliveTimeout

Fix for Python (gunicorn):

bash

gunicorn app:app \
  --bind 0.0.0.0:8080 \
  --keepalive 75 \
  --timeout 120

Case 4: Security Group Blocking Return Traffic

Your app receives the request but the response packet is blocked by a security group rule. ALB never gets the response.

Check it:

bash

# Check security group on targets — must allow traffic FROM ALB SG
aws ec2 describe-security-groups --group-ids sg-xxxxxx \
  --query 'SecurityGroups[0].IpPermissions'

The target security group must allow inbound from the ALB security group on the app port — not just a CIDR range.

Correct pattern:

hcl

# ALB security group
resource "aws_security_group" "alb" {
  name   = "alb-sg"
  vpc_id = var.vpc_id
 
  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
 
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}
 
# App/EC2 security group
resource "aws_security_group" "app" {
  name   = "app-sg"
  vpc_id = var.vpc_id
 
  ingress {
    from_port       = 8080
    to_port         = 8080
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]  # ALB SG, not CIDR
  }
 
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

Case 5: Target Group Deregistration Delay Too High

When deploying new versions (rolling update), old targets enter draining state. During draining, ALB still sends requests to them. If your app has already shut down but the deregistration delay is 300s (default), ALB keeps trying for up to 5 minutes — getting timeouts.

Check it:

bash

aws elbv2 describe-target-group-attributes \
  --target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/my-tg/xxx \
  --query 'Attributes[?Key==`deregistration_delay.timeout_seconds`]'

Fix:

hcl

resource "aws_lb_target_group" "app" {
  # ...
 
  deregistration_delay = 30  # reduce from default 300s if app shuts down faster
}

Also add graceful shutdown in your app — handle SIGTERM, finish in-flight requests, then exit cleanly.

Case 6: ECS Task Stopping Mid-Request

In ECS, if a task is stopped (scale-in, deployment, OOM kill) while handling a request, ALB gets a broken connection — 504.

Fix — ECS task graceful shutdown:

In your task_definition, add a stopTimeout:

hcl

resource "aws_ecs_task_definition" "app" {
  family = "app"
 
  container_definitions = jsonencode([{
    name  = "app"
    image = "myapp:v1.0"
    portMappings = [{
      containerPort = 8080
      protocol      = "tcp"
    }]
    stopTimeout = 30  # give container 30s to finish requests before SIGKILL
  }])
}

Your app should also handle SIGTERM:

python

import signal
import sys
 
def graceful_shutdown(signum, frame):
    print("Received SIGTERM, finishing in-flight requests...")
    # stop accepting new requests
    # wait for active requests to complete
    sys.exit(0)
 
signal.signal(signal.SIGTERM, graceful_shutdown)

Quick Diagnosis Checklist

When you see ALB 504, check in this order:

Check	Command
ALB idle timeout	`aws elbv2 describe-load-balancer-attributes`
Target health	`aws elbv2 describe-target-health`
Backend response time	CloudWatch → ALB → `TargetResponseTime` metric
Security group rules	Check app SG allows ALB SG
Backend keepalive config	Check nginx/node/gunicorn settings
Deregistration delay	Check target group attributes

Reading ALB Access Logs

Enable access logs to see the actual error details:

hcl

resource "aws_lb" "main" {
  access_logs {
    bucket  = aws_s3_bucket.alb_logs.bucket
    prefix  = "alb"
    enabled = true
  }
}

In the logs, 504s appear as:

- - - [timestamp] "GET /api/slow HTTP/1.1" 504 - "-" "-" "0.000" "-" "-"

The TargetProcessingTime field tells you exactly how long the backend took before ALB gave up.

Summary

Cause	Fix
Backend too slow	Increase ALB idle timeout + optimize app
Unhealthy targets	Fix health check path/port
Keepalive mismatch	Set backend keepalive > ALB idle timeout
Security group blocking	Allow ALB SG → App SG
High deregistration delay	Reduce to 30s + add graceful shutdown
ECS task stopping mid-request	Set stopTimeout + handle SIGTERM

ALB 504s almost always come down to one of these six causes. Work through the checklist, check CloudWatch metrics for TargetResponseTime, and you'll find the root cause within 10 minutes.

Want to go deeper? Check our AWS VPC Networking Complete Guide and AWS CloudWatch Monitoring Guide.

AWS ALB 504 Gateway Timeout — Every Cause and Fix (2026)

What 504 Means at the ALB Layer

Case 1: Backend Processing Time Exceeds ALB Idle Timeout

Case 2: Target Group Health Check Failing Silently

Case 3: Keepalive Mismatch Between ALB and Backend

Case 4: Security Group Blocking Return Traffic

Case 5: Target Group Deregistration Delay Too High

Case 6: ECS Task Stopping Mid-Request

Quick Diagnosis Checklist

Reading ALB Access Logs

Summary

Stay ahead of the curve

Related Articles

AWS ALB Target Group Unhealthy — Every Cause and Fix

AWS ALB Showing Unhealthy Targets — How to Fix It

AWS CloudFront 403 Forbidden — Every Cause and Fix (2026)

Comments