All Articles

What is a Load Balancer? Types and How They Work (2026)

Load balancers are everywhere in DevOps — but most beginners don't fully understand how they work. Here's a clear, simple explanation with real examples.

DevOpsBoysApr 7, 20263 min read
Share:Tweet

Every production application uses a load balancer. But many DevOps beginners treat them as a black box — traffic goes in, gets distributed somehow, magic happens.

Here's what's actually happening.


What Problem Does a Load Balancer Solve?

Your app is handling 10,000 requests per second. One server can handle 2,000. You need 5 servers to keep up.

But your users only know one address (api.myapp.com). How does traffic reach all 5 servers?

A load balancer sits in front of your servers. It receives all incoming traffic, then distributes it across your servers based on rules. Users talk to one address — the load balancer handles the rest.

User → api.myapp.com → Load Balancer → Server 1
                                      → Server 2
                                      → Server 3

What Load Balancers Actually Do

  1. Distribute traffic — route requests across multiple servers
  2. Health checking — stop sending traffic to unhealthy servers
  3. SSL/TLS termination — decrypt HTTPS at the load balancer, send HTTP to servers
  4. Session persistence — send same user to same server (sticky sessions)
  5. Connection draining — gracefully remove a server from rotation during updates

Load Balancing Algorithms

Round Robin: Request 1 → Server 1, Request 2 → Server 2, Request 3 → Server 3, Request 4 → Server 1... Equal distribution, ignores server load.

Least Connections: Route to whichever server has the fewest active connections. Better for long-running requests (file uploads, websockets).

IP Hash: Same client IP always goes to the same server. Useful for stateful apps that don't support distributed sessions.

Weighted Round Robin: Server A gets 70% of traffic, Server B gets 30%. Useful during canary deployments.


Layer 4 vs Layer 7 Load Balancers

Layer 4 (Transport Layer): Operates on IP and TCP/UDP. Fast — doesn't look inside packets. Routes based on IP address and port. Can't make routing decisions based on URL path or HTTP headers.

Layer 7 (Application Layer): Understands HTTP. Can route based on URL path, headers, cookies, query strings. Slower than L4 but much more powerful.

Example of what only L7 can do:

/api/*       → API servers
/images/*    → Image servers (CDN origin)
/admin/*     → Admin server (different auth)

AWS Load Balancers

AWS has three load balancers:

ALB (Application Load Balancer): Layer 7, HTTP/HTTPS. Supports path-based routing, host-based routing, WebSockets, gRPC. The right choice for most web applications and microservices.

NLB (Network Load Balancer): Layer 4, TCP/UDP. Extremely fast, handles millions of requests per second, static IP address, preserves client IP. Use for gaming, IoT, real-time apps, or when you need a static IP.

CLB (Classic Load Balancer): Legacy, avoid for new projects.

ALBNLB
Layer7 (HTTP)4 (TCP/UDP)
Path-based routingYesNo
Static IPNoYes
PerformanceHighExtremely high
WebSocketYesYes
Use caseWeb apps, APIsGaming, IoT, VoIP

Kubernetes Ingress vs Load Balancer

In Kubernetes, a Service of type LoadBalancer provisions a cloud load balancer (NLB on AWS, L4).

An Ingress + Ingress Controller (nginx, Traefik) is a Layer 7 load balancer that routes HTTP traffic to services based on path and hostname.

yaml
# Service LoadBalancer - L4
spec:
  type: LoadBalancer  # Creates NLB on AWS
 
# Ingress - L7
spec:
  rules:
    - host: api.myapp.com
      http:
        paths:
          - path: /v1
            backend: api-v1-service
          - path: /v2
            backend: api-v2-service

Most Kubernetes setups: one NLB (from cloud) → Nginx Ingress Controller → routes to services.


Health Checks — The Most Important Feature

A load balancer is only useful if it stops sending traffic to unhealthy servers. Health checks probe each server periodically.

On AWS ALB, target group health check settings:

  • Path: /health (your app returns 200 OK when healthy)
  • Interval: 30 seconds
  • Threshold: 2 consecutive failures = unhealthy
  • Timeout: 5 seconds

If a server fails 2 health checks in a row, ALB removes it from rotation. When it passes 2 again, it's added back.

This is why your app needs a /health endpoint. Without it, dead servers keep receiving traffic.


Resources

Newsletter

Stay ahead of the curve

Get the latest DevOps, Kubernetes, AWS, and AI/ML guides delivered straight to your inbox. No spam — just practical engineering content.

Related Articles

Comments