Back to Technology

Complete Database Mastery Part 11: Scaling & Distributed Systems

January 31, 2026 Wasil Zafar 45 min read

Master database scaling and distributed systems. Learn replication architectures, sharding strategies, CAP theorem, consistent hashing, distributed consensus algorithms, and horizontal vs vertical scaling techniques.

Introduction
- Why Scaling Matters
- Series Navigation
Scaling Approaches
- Vertical Scaling
- Horizontal Scaling
Replication Architectures
Sharding Strategies
CAP & PACELC
- CAP Theorem
- PACELC Extension
Distributed Consensus
Case Studies
Conclusion & Next Steps

Introduction: Why Scaling Matters

Database scaling enables systems to handle growing workloads. As data volumes and user counts increase, understanding horizontal and vertical scaling, replication, and sharding becomes essential for system architects.

                        
                        Series Context: This is Part 11 of 15 in the Complete Database Mastery series. We're tackling the architecture challenges of large-scale systems.
                    

Database Mastery

Your 15-step learning path • Currently on Step 11

Scaling & Distributed Systems

Replication, sharding, CAP theorem

You Are Here

Cloud Databases & Managed Services

AWS, Azure, GCP database offerings

Database Security & Governance

Encryption, access control, compliance

Data Warehousing & Analytics

OLAP, star schemas, columnar DBs

Capstone Projects

Portfolio-ready database implementations

Think of scaling like upgrading a restaurant: Vertical scaling = get a bigger kitchen. Horizontal scaling = open more locations. Both have tradeoffs.

Scaling Approaches

Vertical Scaling (Scale Up)

Add more power to a single machine: more CPU, RAM, faster storage.

Vertical scaling versus horizontal scaling comparison showing single server upgrade vs distributed cluster — Scaling approaches: vertical (bigger hardware) vs horizontal (more machines) with tradeoffs in cost, complexity, and limits

Vertical Scaling Characteristics

Pros	Cons
Simple—no code changes	Hardware limits (can't scale forever)
No distributed complexity	Single point of failure
ACID transactions work normally	Expensive at high end
Lower latency (no network hops)	Downtime during upgrades

Horizontal Scaling (Scale Out)

Add more machines to distribute the load. More complex but virtually unlimited.

-- Example: Application routes queries based on user ID
-- Shard 1: Users 1-1,000,000
-- Shard 2: Users 1,000,001-2,000,000
-- etc.

-- Application logic determines routing
user_id = 1,500,000
shard = (user_id / 1000000) + 1  -- Shard 2

                        
                        When to Scale Horizontally: When you've maximized vertical scaling, need geographic distribution, require fault tolerance, or have predictable sharding keys.
                    

Replication Architectures

Replication copies data across multiple servers for redundancy and read scaling.

Database replication topologies showing primary-replica, multi-master, and chain replication patterns — Replication architectures: primary-replica for read scaling, multi-master for write distribution, chain replication for geographic spread

Database Replication Architecture

flowchart TD
    APP["Application"] --> LB["Load Balancer"]
    LB -->|"Writes"| PRIMARY["Primary DB"]
    LB -->|"Reads"| R1["Replica 1"]
    LB -->|"Reads"| R2["Replica 2"]
    LB -->|"Reads"| R3["Replica 3"]
    
    PRIMARY -->|"Async Replication"| R1
    PRIMARY -->|"Async Replication"| R2
    PRIMARY -->|"Sync Replication"| R3
    
    PRIMARY -.->|"Failover"| R3
    R3 -.->|"Promote"| PRIMARY

Primary-Replica (Master-Slave)

One server handles writes; replicas handle reads. The most common pattern.

# PostgreSQL Streaming Replication Setup

# Primary (postgresql.conf)
wal_level = replica
max_wal_senders = 10
synchronous_standby_names = 'replica1'

# Create replication user
CREATE USER replicator REPLICATION LOGIN PASSWORD 'secret';

# Replica - Create from base backup
pg_basebackup -h primary -D /var/lib/postgresql/data -U replicator -P

# Replica (postgresql.conf)
primary_conninfo = 'host=primary port=5432 user=replicator password=secret'

Read/Write Splitting

# Application-level read/write routing
class DatabaseRouter:
    def get_connection(self, query_type):
        if query_type == 'write':
            return primary_connection
        else:
            # Round-robin across replicas
            return random.choice(replica_connections)

# Or use a proxy like PgBouncer, ProxySQL, HAProxy

Multi-Master

Multiple servers accept writes. Complex but enables geographic distribution.

# MySQL Group Replication (Multi-Primary Mode)
# Each node can accept writes; conflicts resolved automatically

SET GLOBAL group_replication_single_primary_mode = OFF;
SET GLOBAL group_replication_enforce_update_everywhere_checks = ON;
START GROUP_REPLICATION;

                        
                        Multi-Master Challenges: Conflict resolution (last-write-wins vs merge), increased latency for consistency, complexity in debugging. Use only when truly needed.
                    

Synchronous vs Asynchronous Replication

Replication Modes

Mode	Behavior	Tradeoff
Synchronous	Wait for replica acknowledgment	Zero data loss, higher latency
Asynchronous	Return immediately after primary write	Lower latency, potential data loss
Semi-synchronous	Wait for at least one replica	Balance of both

Sharding Strategies

Sharding partitions data across multiple databases. Each shard holds a subset of the data.

Range-Based Sharding

Partition by value ranges. Good for sequential access patterns.

-- Range sharding by date
-- Shard 2023: orders from 2023
-- Shard 2024: orders from 2024

-- Routing logic
IF order_date >= '2024-01-01' THEN
    USE shard_2024;
ELSE
    USE shard_2023;
END IF;

                        
                        Hotspot Risk: If most queries hit recent data, the newest shard becomes overloaded. Consider time-based archiving.
                    

Hash-Based Sharding

Hash the sharding key for even distribution. Best for random access patterns.

# Hash-based shard routing
def get_shard(user_id, num_shards=4):
    return hash(user_id) % num_shards

# user_id=12345 → shard 1
# user_id=67890 → shard 2
# Even distribution regardless of user_id patterns

Consistent Hashing

Minimizes data movement when adding/removing shards. Essential for dynamic scaling.

# Consistent hashing concept
# Nodes placed on a hash ring (0-360 degrees)
# Keys hash to a position, route to nearest clockwise node

# Without consistent hashing:
# Adding a shard moves ~50% of data

# With consistent hashing:
# Adding a shard moves only 1/N of data
# (where N = number of shards)

Sharding Key Selection

Good Keys	Bad Keys
user_id (for user-centric apps)	Auto-increment ID (sequential hotspots)
tenant_id (for SaaS)	Date alone (recent data hotspot)
geographic_region	Low-cardinality fields (status, type)

CAP Theorem & PACELC

CAP Theorem

In a distributed system, you can only guarantee two of three properties:

CAP Properties

Consistency (C): All nodes see the same data at the same time
Availability (A): Every request receives a response (success or failure)
Partition Tolerance (P): System continues despite network failures

Since network partitions are inevitable, the real choice is:

During a network partition:

CP (Consistency + Partition Tolerance):
- Refuse requests if consistency can't be guaranteed
- Examples: PostgreSQL, MongoDB (with majority writes)
- Use case: Financial transactions, inventory systems

AP (Availability + Partition Tolerance):
- Serve requests even with potentially stale data
- Examples: Cassandra, DynamoDB (eventual consistency)
- Use case: Social media feeds, analytics

PACELC Extension

CAP only describes partition behavior. PACELC adds: "Else (no partition), what's the Latency vs Consistency tradeoff?"

PACELC: If Partition → choose A or C
        Else → choose Latency or Consistency

Examples:
- DynamoDB: PA/EL (Available during partition, low latency normally)
- PostgreSQL: PC/EC (Consistent always, but slower)
- Cassandra: PA/EL (Available, low latency, eventual consistency)

Distributed Consensus

How do distributed nodes agree on the state of data? Consensus algorithms solve this.

Raft consensus algorithm flow showing leader election, log replication, and commit phases — Distributed consensus: Raft algorithm with leader election, log replication, and majority-based commit for consistent state

Common Consensus Algorithms

Algorithm	Used By	Key Idea
Paxos	Google Spanner, Chubby	Classic, complex, proven
Raft	etcd, CockroachDB, Consul	Understandable Paxos alternative
Zab	ZooKeeper	Optimized for primary-backup

Raft Consensus (Simplified):
1. Leader Election: Nodes vote for a leader
2. Log Replication: Leader appends entries, replicates to followers
3. Commit: Entry committed when majority acknowledges
4. Leader fails → New election → New leader

Quorum = (N/2) + 1
- 3 nodes: need 2 to agree
- 5 nodes: need 3 to agree

Case Studies

Database Architecture Examples

System	Architecture	CAP Choice
Google Spanner	Globally distributed, TrueTime	CP (strong consistency)
Amazon DynamoDB	Multi-region, eventual consistency	AP (configurable)
CockroachDB	Raft consensus, serializable	CP (SQL + distributed)
Cassandra	Masterless, tunable consistency	AP (default)

                        
                        Modern Trend: NewSQL databases like CockroachDB, Spanner, and YugabyteDB offer the best of both worlds—SQL semantics with horizontal scaling.
                    

Conclusion & Next Steps

Distributed databases are the backbone of internet-scale applications. Understanding CAP tradeoffs, replication topologies, and sharding strategies is essential for architects building resilient systems.

Technology

Cookie Consent

Cookie Preferences

Complete Database Mastery Part 11: Scaling & Distributed Systems

Table of Contents

Introduction: Why Scaling Matters

Database Mastery

Part 1: SQL Fundamentals & Syntax

Advanced SQL & Query Mastery

PostgreSQL Deep Dive

MySQL & MariaDB

Transactions & Concurrency

Query Optimization & Indexing

Data Modeling & Normalization

MongoDB & Document Databases

Redis & Caching Strategies

Database Administration & Migrations