Part 15: Load Balancing & Reverse Proxies

Why Load Balancing?

A single server has finite CPU, memory, and network capacity. Once traffic exceeds what one machine can handle, you need to distribute requests across multiple backend servers. A load balancer sits between clients and your server pool, routing each incoming request to a healthy backend based on a configurable algorithm.

Load balancing provides three critical capabilities: scalability (add more backends to handle more traffic), availability (if one backend dies, traffic routes to others), and maintainability (roll out deploys one server at a time without downtime).

Load Balancer Architecture with Health Checks

flowchart TD
    C1[Client A] --> LB[Load Balancer]
    C2[Client B] --> LB
    C3[Client C] --> LB

    LB -->|Request 1| S1[Backend 1
healthy ✓]
    LB -->|Request 2| S2[Backend 2
healthy ✓]
    LB -->|Request 3| S3[Backend 3
healthy ✓]

    HC[Health Checker] -.->|GET /health| S1
    HC -.->|GET /health| S2
    HC -.->|GET /health| S3
    HC -.->|status report| LB

Forward vs Reverse Proxy

A forward proxy sits in front of clients — it makes requests on their behalf (e.g., corporate proxy, VPN, Tor exit node). The server sees the proxy's IP, not the client's. A reverse proxy sits in front of servers — the client doesn't know which backend handled the request. Load balancers are reverse proxies with traffic distribution logic.

Aspect	Forward Proxy	Reverse Proxy
Sits in front of	Clients	Servers
Who configures it	Client-side (browser, OS)	Server-side (infrastructure)
Use cases	Anonymity, caching, filtering	Load balancing, SSL termination, caching
Server sees	Proxy's IP	Load balancer's IP
Examples	Squid, corporate proxy	NGINX, HAProxy, AWS ALB

L4 vs L7 Load Balancing

The "layer" in L4/L7 refers to the OSI model layer at which the load balancer inspects traffic to make routing decisions.

Feature	L4 (Transport)	L7 (Application)
Inspects	TCP/UDP headers (IP, port)	HTTP headers, URL path, cookies, body
Routing decisions	Source/dest IP and port only	URL path, Host header, cookies, query params
Performance	Very fast — minimal parsing	Slower — must parse HTTP
SSL termination	Pass-through (backend handles TLS)	Terminates TLS at LB, forwards HTTP to backends
Connection handling	Forwards raw TCP streams	Establishes separate connections to backends
Content-based routing	No	Yes — route /api to API servers, /static to CDN
WebSocket support	Native (just TCP)	Requires upgrade handling
Use cases	Database proxying, gaming, raw TCP services	HTTP APIs, web apps, microservices
Examples	HAProxy TCP mode, AWS NLB, IPVS	NGINX, HAProxy HTTP mode, AWS ALB, Envoy

            
            Key Insight: L7 load balancers terminate the client's TCP connection and open a new connection to the backend. This means the backend sees the load balancer's IP as the source — not the client's. To preserve the original client IP, L7 load balancers inject X-Forwarded-For and X-Real-IP headers. Your backend application must read these headers (not the socket's remote address) for accurate client identification, rate limiting, and geo-routing.
        

Load Balancing Algorithms

Algorithm	How It Works	Pros	Cons
Round Robin	Rotate sequentially through backends	Simple, even distribution when backends are equal	Ignores server capacity and current load
Weighted Round Robin	Assign weights — higher-capacity servers get more requests	Handles heterogeneous backends	Weights are static — doesn't adapt to real-time load
Least Connections	Route to the backend with fewest active connections	Adapts to real load — slow requests don't pile up	Doesn't account for connection cost variance
IP Hash	Hash the client IP → always routes to same backend	Session affinity without cookies	Uneven distribution if IP space is skewed
Random	Pick a random backend	No state needed, simple to implement	Can cause bursts on one server
Consistent Hashing	Hash ring — only affected keys remap when backends change	Minimal redistribution on scale events	Can be uneven without virtual nodes

NGINX as a Reverse Proxy

NGINX is the most widely deployed reverse proxy and L7 load balancer. Its upstream block defines the server pool, and proxy_pass forwards requests to that pool.

# /etc/nginx/conf.d/app-lb.conf
# NGINX L7 reverse proxy with upstream pool

upstream backend_pool {
    # Load balancing algorithm (default: round-robin)
    least_conn;

    # Backend servers with optional weights
    server 10.0.1.10:8080 weight=3;   # 3x more traffic
    server 10.0.1.11:8080 weight=2;
    server 10.0.1.12:8080 weight=1;

    # Backup server — only used when all primary are down
    server 10.0.1.99:8080 backup;

    # Health check: mark server down after 3 failures
    # max_fails + fail_timeout work together
    server 10.0.1.10:8080 max_fails=3 fail_timeout=30s;

    # Keep connections alive to backends (connection pooling)
    keepalive 32;
}

server {
    listen 80;
    server_name app.example.com;

    location / {
        proxy_pass http://backend_pool;

        # Preserve original client information
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Connection pooling to upstream (requires keepalive in upstream)
        proxy_http_version 1.1;
        proxy_set_header Connection "";

        # Timeouts
        proxy_connect_timeout 5s;
        proxy_read_timeout 60s;
        proxy_send_timeout 30s;

        # Retry on failure — try next server
        proxy_next_upstream error timeout http_502 http_503;
        proxy_next_upstream_tries 2;
    }

    # Route /api separately (content-based routing)
    location /api/ {
        proxy_pass http://backend_pool;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

# Test NGINX configuration syntax
sudo nginx -t

# Reload NGINX without dropping connections (graceful)
sudo nginx -s reload

# View active connections and upstream status
curl http://localhost/nginx_status 2>/dev/null
# (requires stub_status module enabled)

# Test load balancing — send 10 requests, observe distribution
for i in $(seq 1 10); do
    curl -s http://app.example.com/ -o /dev/null -w "Request $i: %{http_code} (%{time_total}s)\n"
done

HAProxy Configuration

HAProxy supports both L4 (TCP mode) and L7 (HTTP mode) load balancing. Its configuration uses frontend (client-facing listener), backend (server pool), and optional listen (combined frontend+backend) sections.

# /etc/haproxy/haproxy.cfg
# HAProxy L7 load balancer configuration

global
    log /dev/log local0
    maxconn 50000
    user haproxy
    group haproxy
    daemon
    stats socket /var/run/haproxy.sock mode 660 level admin

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    timeout connect 5s
    timeout client  30s
    timeout server  30s
    retries 3
    option  redispatch          # Retry on a different server on failure

frontend http_front
    bind *:80
    bind *:443 ssl crt /etc/ssl/certs/app.pem

    # Content-based routing using ACLs
    acl is_api path_beg /api
    acl is_static path_beg /static /images /css /js

    use_backend api_servers if is_api
    use_backend static_servers if is_static
    default_backend app_servers

backend app_servers
    balance leastconn
    option httpchk GET /health HTTP/1.1\r\nHost:\ localhost

    # Session persistence via cookie
    cookie SERVERID insert indirect nocache

    server app1 10.0.1.10:8080 check cookie app1 weight 3
    server app2 10.0.1.11:8080 check cookie app2 weight 2
    server app3 10.0.1.12:8080 check cookie app3 weight 1

backend api_servers
    balance roundrobin
    option httpchk GET /api/health

    server api1 10.0.2.10:9090 check inter 5s fall 3 rise 2
    server api2 10.0.2.11:9090 check inter 5s fall 3 rise 2

backend static_servers
    balance uri                  # Hash URI for cache efficiency
    server cdn1 10.0.3.10:80 check
    server cdn2 10.0.3.11:80 check

# Stats page (monitoring dashboard)
listen stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 5s
    stats auth admin:secretpass

# Validate HAProxy configuration
haproxy -c -f /etc/haproxy/haproxy.cfg

# Reload HAProxy without dropping connections
sudo systemctl reload haproxy

# View HAProxy stats via socket (real-time)
echo "show stat" | sudo socat /var/run/haproxy.sock stdio | cut -d',' -f1,2,18 | column -t -s','

# View backend server health
echo "show servers state" | sudo socat /var/run/haproxy.sock stdio

# Drain a server (stop sending new connections, finish existing)
echo "set server app_servers/app1 state drain" | sudo socat /var/run/haproxy.sock stdio

# Take a server offline for maintenance
echo "set server app_servers/app1 state maint" | sudo socat /var/run/haproxy.sock stdio

Health Checks & Graceful Shutdown

Health checks ensure traffic only routes to backends that can serve requests. Without them, the load balancer blindly sends traffic to crashed or overloaded servers.

# NGINX active health checks (requires NGINX Plus or nginx_upstream_check_module)
# For open-source NGINX, passive checks use max_fails + fail_timeout

upstream backend_pool {
    zone backend_zone 64k;         # Shared memory for health state

    server 10.0.1.10:8080 max_fails=3 fail_timeout=30s;
    server 10.0.1.11:8080 max_fails=3 fail_timeout=30s;
    server 10.0.1.12:8080 max_fails=3 fail_timeout=30s;

    # max_fails=3   → mark down after 3 consecutive failures
    # fail_timeout=30s → check again after 30s; also defines the window
    #                     for counting failures

    keepalive 16;
}

# A simple /health endpoint on your backend (Node.js example)
# app.get('/health', (req, res) => {
#     if (db.isConnected() && cache.isReady()) {
#         res.status(200).json({ status: 'healthy' });
#     } else {
#         res.status(503).json({ status: 'unhealthy' });
#     }
# });

            
            Connection Draining During Deploys: When deploying a new version, don't just kill the old process. Instead: (1) Signal the load balancer to stop sending new requests to the backend (drain state), (2) Wait for in-flight requests to complete (drain timeout, typically 30-60s), (3) Then stop the process. Without connection draining, clients mid-request get 502 errors. In HAProxy: set server backend/server1 state drain. In NGINX: mark the server down in config and reload. In Kubernetes: the pod enters Terminating state and the preStop hook + terminationGracePeriodSeconds handle draining automatically.
        

# Graceful shutdown pattern for a backend application
# 1. Stop accepting new connections
# 2. Finish processing in-flight requests
# 3. Close database/cache connections
# 4. Exit

# Example: handling SIGTERM in a bash wrapper
#!/bin/bash
APP_PID=""

start_app() {
    /usr/bin/my-app --port 8080 &
    APP_PID=$!
}

graceful_shutdown() {
    echo "Received SIGTERM — draining connections..."
    # Tell load balancer we're going away
    curl -s http://localhost:8080/admin/drain -X POST
    # Wait for in-flight requests (max 30s)
    sleep 30
    # Send SIGTERM to app
    kill -TERM "$APP_PID"
    wait "$APP_PID"
    echo "Shutdown complete"
    exit 0
}

trap graceful_shutdown SIGTERM SIGINT

start_app
wait "$APP_PID"

Kubernetes

How Kubernetes Services Load Balance

Kubernetes Services provide built-in L4 load balancing via kube-proxy, which runs on every node and programs traffic rules. In iptables mode (default), kube-proxy creates iptables rules that randomly distribute packets across pod endpoints — this is stateless, per-connection load balancing with no health checks at the proxy level (liveness/readiness probes handle health). In IPVS mode, kube-proxy uses Linux IPVS (kernel-level L4 LB) which supports round-robin, least connections, and weighted algorithms with O(1) performance for large endpoint lists. Service types: ClusterIP (internal only), NodePort (exposed on every node's IP), LoadBalancer (provisions cloud LB — e.g., AWS NLB/ALB, GCP LB). For L7 routing (path-based, host-based), use an Ingress controller (NGINX Ingress, Traefik, Envoy/Istio) which acts as an L7 reverse proxy inside the cluster.

kube-proxyiptablesIPVSIngress

Exercises

# Exercise 1: Set up NGINX as a local load balancer
# Start 3 simple HTTP servers on different ports
python3 -m http.server 8001 --directory /tmp/s1 &
python3 -m http.server 8002 --directory /tmp/s2 &
python3 -m http.server 8003 --directory /tmp/s3 &
# Create NGINX upstream config pointing to all 3, test with curl

# Exercise 2: Observe round-robin distribution
for i in $(seq 1 9); do
    curl -s http://localhost/ -H "Host: app.local" | grep -o "s[0-9]"
done
# Should see s1, s2, s3, s1, s2, s3 ...

# Exercise 3: Simulate a backend failure
# Kill one of the python servers, then send requests
# Verify NGINX routes around the dead backend

# Exercise 4: Check HAProxy stats (if installed)
curl -u admin:admin http://localhost:8404/stats?stats;csv

# Exercise 5: View active connections per backend
ss -tnp | grep ':808[0-9]' | awk '{print $5}' | sort | uniq -c | sort -rn

Conclusion & Next Steps

Load balancers are the gateway to horizontal scaling — they distribute traffic across multiple backends, route around failures, and enable zero-downtime deployments via connection draining. L4 load balancing is fast and simple (TCP forwarding), while L7 gives you content-based routing, header manipulation, and SSL termination. NGINX and HAProxy are the workhorses of production infrastructure — master their config and you can operate any service at scale. Health checks, session persistence, and graceful shutdown are the operational details that separate "works in dev" from "survives in production."

PreviousPart 14: Routing, NAT & Firewalls Next Part 16: Cryptography & TLS

Cookie Consent

Part 15: Load Balancing & Reverse Proxies

Table of Contents

Why Load Balancing?

Forward vs Reverse Proxy

L4 vs L7 Load Balancing

Load Balancing Algorithms

NGINX as a Reverse Proxy

HAProxy Configuration

Health Checks & Graceful Shutdown

How Kubernetes Services Load Balance

Exercises

Conclusion & Next Steps

Cookie Consent

Part 15: Load Balancing & Reverse Proxies

Table of Contents

Why Load Balancing?

Forward vs Reverse Proxy

L4 vs L7 Load Balancing

Load Balancing Algorithms

NGINX as a Reverse Proxy

HAProxy Configuration

Health Checks & Graceful Shutdown

How Kubernetes Services Load Balance

Exercises

Conclusion & Next Steps

Continue the Series

Part 14: Routing, NAT & Firewalls

Part 16: Cryptography & TLS