Why Load Balancing?
A single server has finite CPU, memory, and network capacity. Once traffic exceeds what one machine can handle, you need to distribute requests across multiple backend servers. A load balancer sits between clients and your server pool, routing each incoming request to a healthy backend based on a configurable algorithm.
Load balancing provides three critical capabilities: scalability (add more backends to handle more traffic), availability (if one backend dies, traffic routes to others), and maintainability (roll out deploys one server at a time without downtime).
flowchart TD
C1[Client A] --> LB[Load Balancer]
C2[Client B] --> LB
C3[Client C] --> LB
LB -->|Request 1| S1[Backend 1
healthy ✓]
LB -->|Request 2| S2[Backend 2
healthy ✓]
LB -->|Request 3| S3[Backend 3
healthy ✓]
HC[Health Checker] -.->|GET /health| S1
HC -.->|GET /health| S2
HC -.->|GET /health| S3
HC -.->|status report| LB
Forward vs Reverse Proxy
A forward proxy sits in front of clients — it makes requests on their behalf (e.g., corporate proxy, VPN, Tor exit node). The server sees the proxy's IP, not the client's. A reverse proxy sits in front of servers — the client doesn't know which backend handled the request. Load balancers are reverse proxies with traffic distribution logic.
| Aspect | Forward Proxy | Reverse Proxy |
|---|---|---|
| Sits in front of | Clients | Servers |
| Who configures it | Client-side (browser, OS) | Server-side (infrastructure) |
| Use cases | Anonymity, caching, filtering | Load balancing, SSL termination, caching |
| Server sees | Proxy's IP | Load balancer's IP |
| Examples | Squid, corporate proxy | NGINX, HAProxy, AWS ALB |
L4 vs L7 Load Balancing
The "layer" in L4/L7 refers to the OSI model layer at which the load balancer inspects traffic to make routing decisions.
| Feature | L4 (Transport) | L7 (Application) |
|---|---|---|
| Inspects | TCP/UDP headers (IP, port) | HTTP headers, URL path, cookies, body |
| Routing decisions | Source/dest IP and port only | URL path, Host header, cookies, query params |
| Performance | Very fast — minimal parsing | Slower — must parse HTTP |
| SSL termination | Pass-through (backend handles TLS) | Terminates TLS at LB, forwards HTTP to backends |
| Connection handling | Forwards raw TCP streams | Establishes separate connections to backends |
| Content-based routing | No | Yes — route /api to API servers, /static to CDN |
| WebSocket support | Native (just TCP) | Requires upgrade handling |
| Use cases | Database proxying, gaming, raw TCP services | HTTP APIs, web apps, microservices |
| Examples | HAProxy TCP mode, AWS NLB, IPVS | NGINX, HAProxy HTTP mode, AWS ALB, Envoy |
X-Forwarded-For and X-Real-IP headers. Your backend application must read these headers (not the socket's remote address) for accurate client identification, rate limiting, and geo-routing.
Load Balancing Algorithms
| Algorithm | How It Works | Pros | Cons |
|---|---|---|---|
| Round Robin | Rotate sequentially through backends | Simple, even distribution when backends are equal | Ignores server capacity and current load |
| Weighted Round Robin | Assign weights — higher-capacity servers get more requests | Handles heterogeneous backends | Weights are static — doesn't adapt to real-time load |
| Least Connections | Route to the backend with fewest active connections | Adapts to real load — slow requests don't pile up | Doesn't account for connection cost variance |
| IP Hash | Hash the client IP → always routes to same backend | Session affinity without cookies | Uneven distribution if IP space is skewed |
| Random | Pick a random backend | No state needed, simple to implement | Can cause bursts on one server |
| Consistent Hashing | Hash ring — only affected keys remap when backends change | Minimal redistribution on scale events | Can be uneven without virtual nodes |
NGINX as a Reverse Proxy
NGINX is the most widely deployed reverse proxy and L7 load balancer. Its upstream block defines the server pool, and proxy_pass forwards requests to that pool.
# /etc/nginx/conf.d/app-lb.conf
# NGINX L7 reverse proxy with upstream pool
upstream backend_pool {
# Load balancing algorithm (default: round-robin)
least_conn;
# Backend servers with optional weights
server 10.0.1.10:8080 weight=3; # 3x more traffic
server 10.0.1.11:8080 weight=2;
server 10.0.1.12:8080 weight=1;
# Backup server — only used when all primary are down
server 10.0.1.99:8080 backup;
# Health check: mark server down after 3 failures
# max_fails + fail_timeout work together
server 10.0.1.10:8080 max_fails=3 fail_timeout=30s;
# Keep connections alive to backends (connection pooling)
keepalive 32;
}
server {
listen 80;
server_name app.example.com;
location / {
proxy_pass http://backend_pool;
# Preserve original client information
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Connection pooling to upstream (requires keepalive in upstream)
proxy_http_version 1.1;
proxy_set_header Connection "";
# Timeouts
proxy_connect_timeout 5s;
proxy_read_timeout 60s;
proxy_send_timeout 30s;
# Retry on failure — try next server
proxy_next_upstream error timeout http_502 http_503;
proxy_next_upstream_tries 2;
}
# Route /api separately (content-based routing)
location /api/ {
proxy_pass http://backend_pool;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
# Test NGINX configuration syntax
sudo nginx -t
# Reload NGINX without dropping connections (graceful)
sudo nginx -s reload
# View active connections and upstream status
curl http://localhost/nginx_status 2>/dev/null
# (requires stub_status module enabled)
# Test load balancing — send 10 requests, observe distribution
for i in $(seq 1 10); do
curl -s http://app.example.com/ -o /dev/null -w "Request $i: %{http_code} (%{time_total}s)\n"
done
HAProxy Configuration
HAProxy supports both L4 (TCP mode) and L7 (HTTP mode) load balancing. Its configuration uses frontend (client-facing listener), backend (server pool), and optional listen (combined frontend+backend) sections.
# /etc/haproxy/haproxy.cfg
# HAProxy L7 load balancer configuration
global
log /dev/log local0
maxconn 50000
user haproxy
group haproxy
daemon
stats socket /var/run/haproxy.sock mode 660 level admin
defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5s
timeout client 30s
timeout server 30s
retries 3
option redispatch # Retry on a different server on failure
frontend http_front
bind *:80
bind *:443 ssl crt /etc/ssl/certs/app.pem
# Content-based routing using ACLs
acl is_api path_beg /api
acl is_static path_beg /static /images /css /js
use_backend api_servers if is_api
use_backend static_servers if is_static
default_backend app_servers
backend app_servers
balance leastconn
option httpchk GET /health HTTP/1.1\r\nHost:\ localhost
# Session persistence via cookie
cookie SERVERID insert indirect nocache
server app1 10.0.1.10:8080 check cookie app1 weight 3
server app2 10.0.1.11:8080 check cookie app2 weight 2
server app3 10.0.1.12:8080 check cookie app3 weight 1
backend api_servers
balance roundrobin
option httpchk GET /api/health
server api1 10.0.2.10:9090 check inter 5s fall 3 rise 2
server api2 10.0.2.11:9090 check inter 5s fall 3 rise 2
backend static_servers
balance uri # Hash URI for cache efficiency
server cdn1 10.0.3.10:80 check
server cdn2 10.0.3.11:80 check
# Stats page (monitoring dashboard)
listen stats
bind *:8404
stats enable
stats uri /stats
stats refresh 5s
stats auth admin:secretpass
# Validate HAProxy configuration
haproxy -c -f /etc/haproxy/haproxy.cfg
# Reload HAProxy without dropping connections
sudo systemctl reload haproxy
# View HAProxy stats via socket (real-time)
echo "show stat" | sudo socat /var/run/haproxy.sock stdio | cut -d',' -f1,2,18 | column -t -s','
# View backend server health
echo "show servers state" | sudo socat /var/run/haproxy.sock stdio
# Drain a server (stop sending new connections, finish existing)
echo "set server app_servers/app1 state drain" | sudo socat /var/run/haproxy.sock stdio
# Take a server offline for maintenance
echo "set server app_servers/app1 state maint" | sudo socat /var/run/haproxy.sock stdio
Health Checks & Graceful Shutdown
Health checks ensure traffic only routes to backends that can serve requests. Without them, the load balancer blindly sends traffic to crashed or overloaded servers.
# NGINX active health checks (requires NGINX Plus or nginx_upstream_check_module)
# For open-source NGINX, passive checks use max_fails + fail_timeout
upstream backend_pool {
zone backend_zone 64k; # Shared memory for health state
server 10.0.1.10:8080 max_fails=3 fail_timeout=30s;
server 10.0.1.11:8080 max_fails=3 fail_timeout=30s;
server 10.0.1.12:8080 max_fails=3 fail_timeout=30s;
# max_fails=3 → mark down after 3 consecutive failures
# fail_timeout=30s → check again after 30s; also defines the window
# for counting failures
keepalive 16;
}
# A simple /health endpoint on your backend (Node.js example)
# app.get('/health', (req, res) => {
# if (db.isConnected() && cache.isReady()) {
# res.status(200).json({ status: 'healthy' });
# } else {
# res.status(503).json({ status: 'unhealthy' });
# }
# });
set server backend/server1 state drain. In NGINX: mark the server down in config and reload. In Kubernetes: the pod enters Terminating state and the preStop hook + terminationGracePeriodSeconds handle draining automatically.
# Graceful shutdown pattern for a backend application
# 1. Stop accepting new connections
# 2. Finish processing in-flight requests
# 3. Close database/cache connections
# 4. Exit
# Example: handling SIGTERM in a bash wrapper
#!/bin/bash
APP_PID=""
start_app() {
/usr/bin/my-app --port 8080 &
APP_PID=$!
}
graceful_shutdown() {
echo "Received SIGTERM — draining connections..."
# Tell load balancer we're going away
curl -s http://localhost:8080/admin/drain -X POST
# Wait for in-flight requests (max 30s)
sleep 30
# Send SIGTERM to app
kill -TERM "$APP_PID"
wait "$APP_PID"
echo "Shutdown complete"
exit 0
}
trap graceful_shutdown SIGTERM SIGINT
start_app
wait "$APP_PID"
How Kubernetes Services Load Balance
Kubernetes Services provide built-in L4 load balancing via kube-proxy, which runs on every node and programs traffic rules. In iptables mode (default), kube-proxy creates iptables rules that randomly distribute packets across pod endpoints — this is stateless, per-connection load balancing with no health checks at the proxy level (liveness/readiness probes handle health). In IPVS mode, kube-proxy uses Linux IPVS (kernel-level L4 LB) which supports round-robin, least connections, and weighted algorithms with O(1) performance for large endpoint lists. Service types: ClusterIP (internal only), NodePort (exposed on every node's IP), LoadBalancer (provisions cloud LB — e.g., AWS NLB/ALB, GCP LB). For L7 routing (path-based, host-based), use an Ingress controller (NGINX Ingress, Traefik, Envoy/Istio) which acts as an L7 reverse proxy inside the cluster.
Exercises
# Exercise 1: Set up NGINX as a local load balancer
# Start 3 simple HTTP servers on different ports
python3 -m http.server 8001 --directory /tmp/s1 &
python3 -m http.server 8002 --directory /tmp/s2 &
python3 -m http.server 8003 --directory /tmp/s3 &
# Create NGINX upstream config pointing to all 3, test with curl
# Exercise 2: Observe round-robin distribution
for i in $(seq 1 9); do
curl -s http://localhost/ -H "Host: app.local" | grep -o "s[0-9]"
done
# Should see s1, s2, s3, s1, s2, s3 ...
# Exercise 3: Simulate a backend failure
# Kill one of the python servers, then send requests
# Verify NGINX routes around the dead backend
# Exercise 4: Check HAProxy stats (if installed)
curl -u admin:admin http://localhost:8404/stats?stats;csv
# Exercise 5: View active connections per backend
ss -tnp | grep ':808[0-9]' | awk '{print $5}' | sort | uniq -c | sort -rn
Conclusion & Next Steps
Load balancers are the gateway to horizontal scaling — they distribute traffic across multiple backends, route around failures, and enable zero-downtime deployments via connection draining. L4 load balancing is fast and simple (TCP forwarding), while L7 gives you content-based routing, header manipulation, and SSL termination. NGINX and HAProxy are the workhorses of production infrastructure — master their config and you can operate any service at scale. Health checks, session persistence, and graceful shutdown are the operational details that separate "works in dev" from "survives in production."