Back to Computing & Systems Foundations Series

Part 15: Load Balancing & Reverse Proxies

May 13, 2026Wasil Zafar18 min read

How load balancers distribute traffic across backend servers — L4 vs L7 trade-offs, NGINX and HAProxy configuration, health checks, session persistence, and the architecture behind every production service.

Table of Contents

  1. Why Load Balancing?
  2. L4 vs L7 Load Balancing
  3. Load Balancing Algorithms
  4. NGINX as a Reverse Proxy
  5. HAProxy Configuration
  6. Health Checks & Graceful Shutdown
  7. Exercises
  8. Conclusion

Why Load Balancing?

A single server has finite CPU, memory, and network capacity. Once traffic exceeds what one machine can handle, you need to distribute requests across multiple backend servers. A load balancer sits between clients and your server pool, routing each incoming request to a healthy backend based on a configurable algorithm.

Load balancing provides three critical capabilities: scalability (add more backends to handle more traffic), availability (if one backend dies, traffic routes to others), and maintainability (roll out deploys one server at a time without downtime).

Load Balancer Architecture with Health Checks
flowchart TD
    C1[Client A] --> LB[Load Balancer]
    C2[Client B] --> LB
    C3[Client C] --> LB

    LB -->|Request 1| S1[Backend 1
healthy ✓] LB -->|Request 2| S2[Backend 2
healthy ✓] LB -->|Request 3| S3[Backend 3
healthy ✓] HC[Health Checker] -.->|GET /health| S1 HC -.->|GET /health| S2 HC -.->|GET /health| S3 HC -.->|status report| LB

Forward vs Reverse Proxy

A forward proxy sits in front of clients — it makes requests on their behalf (e.g., corporate proxy, VPN, Tor exit node). The server sees the proxy's IP, not the client's. A reverse proxy sits in front of servers — the client doesn't know which backend handled the request. Load balancers are reverse proxies with traffic distribution logic.

AspectForward ProxyReverse Proxy
Sits in front ofClientsServers
Who configures itClient-side (browser, OS)Server-side (infrastructure)
Use casesAnonymity, caching, filteringLoad balancing, SSL termination, caching
Server seesProxy's IPLoad balancer's IP
ExamplesSquid, corporate proxyNGINX, HAProxy, AWS ALB

L4 vs L7 Load Balancing

The "layer" in L4/L7 refers to the OSI model layer at which the load balancer inspects traffic to make routing decisions.

FeatureL4 (Transport)L7 (Application)
InspectsTCP/UDP headers (IP, port)HTTP headers, URL path, cookies, body
Routing decisionsSource/dest IP and port onlyURL path, Host header, cookies, query params
PerformanceVery fast — minimal parsingSlower — must parse HTTP
SSL terminationPass-through (backend handles TLS)Terminates TLS at LB, forwards HTTP to backends
Connection handlingForwards raw TCP streamsEstablishes separate connections to backends
Content-based routingNoYes — route /api to API servers, /static to CDN
WebSocket supportNative (just TCP)Requires upgrade handling
Use casesDatabase proxying, gaming, raw TCP servicesHTTP APIs, web apps, microservices
ExamplesHAProxy TCP mode, AWS NLB, IPVSNGINX, HAProxy HTTP mode, AWS ALB, Envoy
Key Insight: L7 load balancers terminate the client's TCP connection and open a new connection to the backend. This means the backend sees the load balancer's IP as the source — not the client's. To preserve the original client IP, L7 load balancers inject X-Forwarded-For and X-Real-IP headers. Your backend application must read these headers (not the socket's remote address) for accurate client identification, rate limiting, and geo-routing.

Load Balancing Algorithms

AlgorithmHow It WorksProsCons
Round RobinRotate sequentially through backendsSimple, even distribution when backends are equalIgnores server capacity and current load
Weighted Round RobinAssign weights — higher-capacity servers get more requestsHandles heterogeneous backendsWeights are static — doesn't adapt to real-time load
Least ConnectionsRoute to the backend with fewest active connectionsAdapts to real load — slow requests don't pile upDoesn't account for connection cost variance
IP HashHash the client IP → always routes to same backendSession affinity without cookiesUneven distribution if IP space is skewed
RandomPick a random backendNo state needed, simple to implementCan cause bursts on one server
Consistent HashingHash ring — only affected keys remap when backends changeMinimal redistribution on scale eventsCan be uneven without virtual nodes

NGINX as a Reverse Proxy

NGINX is the most widely deployed reverse proxy and L7 load balancer. Its upstream block defines the server pool, and proxy_pass forwards requests to that pool.

# /etc/nginx/conf.d/app-lb.conf
# NGINX L7 reverse proxy with upstream pool

upstream backend_pool {
    # Load balancing algorithm (default: round-robin)
    least_conn;

    # Backend servers with optional weights
    server 10.0.1.10:8080 weight=3;   # 3x more traffic
    server 10.0.1.11:8080 weight=2;
    server 10.0.1.12:8080 weight=1;

    # Backup server — only used when all primary are down
    server 10.0.1.99:8080 backup;

    # Health check: mark server down after 3 failures
    # max_fails + fail_timeout work together
    server 10.0.1.10:8080 max_fails=3 fail_timeout=30s;

    # Keep connections alive to backends (connection pooling)
    keepalive 32;
}

server {
    listen 80;
    server_name app.example.com;

    location / {
        proxy_pass http://backend_pool;

        # Preserve original client information
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Connection pooling to upstream (requires keepalive in upstream)
        proxy_http_version 1.1;
        proxy_set_header Connection "";

        # Timeouts
        proxy_connect_timeout 5s;
        proxy_read_timeout 60s;
        proxy_send_timeout 30s;

        # Retry on failure — try next server
        proxy_next_upstream error timeout http_502 http_503;
        proxy_next_upstream_tries 2;
    }

    # Route /api separately (content-based routing)
    location /api/ {
        proxy_pass http://backend_pool;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}
# Test NGINX configuration syntax
sudo nginx -t

# Reload NGINX without dropping connections (graceful)
sudo nginx -s reload

# View active connections and upstream status
curl http://localhost/nginx_status 2>/dev/null
# (requires stub_status module enabled)

# Test load balancing — send 10 requests, observe distribution
for i in $(seq 1 10); do
    curl -s http://app.example.com/ -o /dev/null -w "Request $i: %{http_code} (%{time_total}s)\n"
done

HAProxy Configuration

HAProxy supports both L4 (TCP mode) and L7 (HTTP mode) load balancing. Its configuration uses frontend (client-facing listener), backend (server pool), and optional listen (combined frontend+backend) sections.

# /etc/haproxy/haproxy.cfg
# HAProxy L7 load balancer configuration

global
    log /dev/log local0
    maxconn 50000
    user haproxy
    group haproxy
    daemon
    stats socket /var/run/haproxy.sock mode 660 level admin

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    timeout connect 5s
    timeout client  30s
    timeout server  30s
    retries 3
    option  redispatch          # Retry on a different server on failure

frontend http_front
    bind *:80
    bind *:443 ssl crt /etc/ssl/certs/app.pem

    # Content-based routing using ACLs
    acl is_api path_beg /api
    acl is_static path_beg /static /images /css /js

    use_backend api_servers if is_api
    use_backend static_servers if is_static
    default_backend app_servers

backend app_servers
    balance leastconn
    option httpchk GET /health HTTP/1.1\r\nHost:\ localhost

    # Session persistence via cookie
    cookie SERVERID insert indirect nocache

    server app1 10.0.1.10:8080 check cookie app1 weight 3
    server app2 10.0.1.11:8080 check cookie app2 weight 2
    server app3 10.0.1.12:8080 check cookie app3 weight 1

backend api_servers
    balance roundrobin
    option httpchk GET /api/health

    server api1 10.0.2.10:9090 check inter 5s fall 3 rise 2
    server api2 10.0.2.11:9090 check inter 5s fall 3 rise 2

backend static_servers
    balance uri                  # Hash URI for cache efficiency
    server cdn1 10.0.3.10:80 check
    server cdn2 10.0.3.11:80 check

# Stats page (monitoring dashboard)
listen stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 5s
    stats auth admin:secretpass
# Validate HAProxy configuration
haproxy -c -f /etc/haproxy/haproxy.cfg

# Reload HAProxy without dropping connections
sudo systemctl reload haproxy

# View HAProxy stats via socket (real-time)
echo "show stat" | sudo socat /var/run/haproxy.sock stdio | cut -d',' -f1,2,18 | column -t -s','

# View backend server health
echo "show servers state" | sudo socat /var/run/haproxy.sock stdio

# Drain a server (stop sending new connections, finish existing)
echo "set server app_servers/app1 state drain" | sudo socat /var/run/haproxy.sock stdio

# Take a server offline for maintenance
echo "set server app_servers/app1 state maint" | sudo socat /var/run/haproxy.sock stdio

Health Checks & Graceful Shutdown

Health checks ensure traffic only routes to backends that can serve requests. Without them, the load balancer blindly sends traffic to crashed or overloaded servers.

# NGINX active health checks (requires NGINX Plus or nginx_upstream_check_module)
# For open-source NGINX, passive checks use max_fails + fail_timeout

upstream backend_pool {
    zone backend_zone 64k;         # Shared memory for health state

    server 10.0.1.10:8080 max_fails=3 fail_timeout=30s;
    server 10.0.1.11:8080 max_fails=3 fail_timeout=30s;
    server 10.0.1.12:8080 max_fails=3 fail_timeout=30s;

    # max_fails=3   → mark down after 3 consecutive failures
    # fail_timeout=30s → check again after 30s; also defines the window
    #                     for counting failures

    keepalive 16;
}

# A simple /health endpoint on your backend (Node.js example)
# app.get('/health', (req, res) => {
#     if (db.isConnected() && cache.isReady()) {
#         res.status(200).json({ status: 'healthy' });
#     } else {
#         res.status(503).json({ status: 'unhealthy' });
#     }
# });
Connection Draining During Deploys: When deploying a new version, don't just kill the old process. Instead: (1) Signal the load balancer to stop sending new requests to the backend (drain state), (2) Wait for in-flight requests to complete (drain timeout, typically 30-60s), (3) Then stop the process. Without connection draining, clients mid-request get 502 errors. In HAProxy: set server backend/server1 state drain. In NGINX: mark the server down in config and reload. In Kubernetes: the pod enters Terminating state and the preStop hook + terminationGracePeriodSeconds handle draining automatically.
# Graceful shutdown pattern for a backend application
# 1. Stop accepting new connections
# 2. Finish processing in-flight requests
# 3. Close database/cache connections
# 4. Exit

# Example: handling SIGTERM in a bash wrapper
#!/bin/bash
APP_PID=""

start_app() {
    /usr/bin/my-app --port 8080 &
    APP_PID=$!
}

graceful_shutdown() {
    echo "Received SIGTERM — draining connections..."
    # Tell load balancer we're going away
    curl -s http://localhost:8080/admin/drain -X POST
    # Wait for in-flight requests (max 30s)
    sleep 30
    # Send SIGTERM to app
    kill -TERM "$APP_PID"
    wait "$APP_PID"
    echo "Shutdown complete"
    exit 0
}

trap graceful_shutdown SIGTERM SIGINT

start_app
wait "$APP_PID"
Kubernetes

How Kubernetes Services Load Balance

Kubernetes Services provide built-in L4 load balancing via kube-proxy, which runs on every node and programs traffic rules. In iptables mode (default), kube-proxy creates iptables rules that randomly distribute packets across pod endpoints — this is stateless, per-connection load balancing with no health checks at the proxy level (liveness/readiness probes handle health). In IPVS mode, kube-proxy uses Linux IPVS (kernel-level L4 LB) which supports round-robin, least connections, and weighted algorithms with O(1) performance for large endpoint lists. Service types: ClusterIP (internal only), NodePort (exposed on every node's IP), LoadBalancer (provisions cloud LB — e.g., AWS NLB/ALB, GCP LB). For L7 routing (path-based, host-based), use an Ingress controller (NGINX Ingress, Traefik, Envoy/Istio) which acts as an L7 reverse proxy inside the cluster.

kube-proxyiptablesIPVSIngress

Exercises

# Exercise 1: Set up NGINX as a local load balancer
# Start 3 simple HTTP servers on different ports
python3 -m http.server 8001 --directory /tmp/s1 &
python3 -m http.server 8002 --directory /tmp/s2 &
python3 -m http.server 8003 --directory /tmp/s3 &
# Create NGINX upstream config pointing to all 3, test with curl

# Exercise 2: Observe round-robin distribution
for i in $(seq 1 9); do
    curl -s http://localhost/ -H "Host: app.local" | grep -o "s[0-9]"
done
# Should see s1, s2, s3, s1, s2, s3 ...

# Exercise 3: Simulate a backend failure
# Kill one of the python servers, then send requests
# Verify NGINX routes around the dead backend

# Exercise 4: Check HAProxy stats (if installed)
curl -u admin:admin http://localhost:8404/stats?stats;csv

# Exercise 5: View active connections per backend
ss -tnp | grep ':808[0-9]' | awk '{print $5}' | sort | uniq -c | sort -rn

Conclusion & Next Steps

Load balancers are the gateway to horizontal scaling — they distribute traffic across multiple backends, route around failures, and enable zero-downtime deployments via connection draining. L4 load balancing is fast and simple (TCP forwarding), while L7 gives you content-based routing, header manipulation, and SSL termination. NGINX and HAProxy are the workhorses of production infrastructure — master their config and you can operate any service at scale. Health checks, session persistence, and graceful shutdown are the operational details that separate "works in dev" from "survives in production."