CAP Theorem
Introduction to System Design
Fundamentals, why it matters, key concepts
Scalability Fundamentals
Horizontal vs vertical scaling, stateless design
Load Balancing & Caching
Algorithms, Redis, CDN patterns
Database Design & Sharding
SQL vs NoSQL, replication, partitioning
Microservices Architecture
Service decomposition, API gateways, sagas
API Design & REST/GraphQL
RESTful principles, GraphQL, gRPC
Message Queues & Event-Driven
Kafka, RabbitMQ, event sourcing
8
CAP Theorem & Consistency
Distributed trade-offs, eventual consistency
You Are Here
9
Rate Limiting & Security
Throttling algorithms, DDoS protection
10
Monitoring & Observability
Logging, metrics, distributed tracing
11
Real-World Case Studies
URL shortener, chat, feed, video streaming
12
Low-Level Design Patterns
SOLID, OOP patterns, data modeling
13
Distributed Systems Deep Dive
Consensus, Paxos, Raft, coordination
14
Authentication & Security
OAuth, JWT, zero trust, compliance
15
Interview Preparation
4-step framework, estimation, strategies
The CAP theorem states that a distributed system can provide at most two of three guarantees simultaneously: Consistency, Availability, and Partition tolerance. Understanding these trade-offs is fundamental to designing reliable distributed systems.
Key Insight: In practice, partition tolerance is non-negotiable in distributed systems. The real choice is between consistency and availability during network partitions.
The Three Guarantees
- Consistency (C): Every read receives the most recent write or an error. All nodes see the same data at the same time.
- Availability (A): Every request receives a non-error response, without guarantee it's the most recent write.
- Partition Tolerance (P): System continues operating despite arbitrary message loss or failure between nodes.
CAP Visualization
# CAP Theorem - You can only pick 2 out of 3
Consistency (C)
/\
/ \
/ \
/ CA \ # CA: Consistent + Available
/________\ # (Not partition tolerant)
/\ /\
/ \ / \
/ CP \ / AP \
/______\ /______\
Partition Availability
Tolerance (P) (A)
# CP Systems: Consistent + Partition Tolerant
# - May become unavailable during partitions
# - Examples: MongoDB, HBase, Redis Cluster
# AP Systems: Available + Partition Tolerant
# - May return stale data during partitions
# - Examples: Cassandra, DynamoDB, CouchDB
# CA Systems: Consistent + Available
# - Cannot tolerate network partitions
# - Examples: Traditional RDBMS (single node)
CAP Trade-offs
CP Systems (Consistency + Partition Tolerance)
Prioritize consistency over availability. During network partitions, the system may reject requests to maintain consistency.
# CP System Behavior During Partition
# Normal operation
Client --> Node A --> Node B --> Node C
(Write) (Replicate) (Replicate)
# All nodes have consistent data
# During network partition
Client --> Node A [PARTITION] Node B, C
# Node A can't confirm replication to B and C
# CP system OPTIONS:
# 1. Reject the write (preserve consistency)
# 2. Accept write but return error until sync
# 3. Block until partition heals
# Example: MongoDB with majority write concern
from pymongo import MongoClient, WriteConcern
client = MongoClient('mongodb://localhost:27017')
db = client.mydb
# Require majority of nodes to acknowledge write
collection = db.get_collection(
'orders',
write_concern=WriteConcern(w='majority', wtimeout=5000)
)
try:
result = collection.insert_one({'order_id': '123', 'status': 'new'})
print("Write acknowledged by majority")
except Exception as e:
print(f"Write failed - partition or timeout: {e}")
# System chose Consistency over Availability
Use when: Financial transactions, inventory systems, any data where correctness is critical
MongoDB
HBase
ZooKeeper
AP Systems (Availability + Partition Tolerance)
Prioritize availability over consistency. During partitions, nodes continue serving requests with potentially stale data.
# AP System Behavior During Partition
# Normal operation
Client --> Node A --> Node B --> Node C
(Write) (Async) (Async)
# Writes replicate asynchronously
# During network partition
Client A --> Node A [PARTITION] Node B <-- Client B
(Write X=1) (Read X=?)
# AP system response:
# - Node A accepts write (X=1)
# - Node B serves read with old value (X=0)
# - Both clients get responses (Available!)
# - Data is temporarily inconsistent
# - After partition heals: conflict resolution
# Example: Cassandra with eventual consistency
from cassandra.cluster import Cluster
from cassandra import ConsistencyLevel
cluster = Cluster(['node1', 'node2', 'node3'])
session = cluster.connect('mydb')
# Write with ONE consistency (highly available)
session.execute(
"INSERT INTO orders (id, status) VALUES (%s, %s)",
('123', 'new'),
consistency_level=ConsistencyLevel.ONE # Just one node needed
)
# Read with ONE consistency (may get stale data)
result = session.execute(
"SELECT * FROM orders WHERE id = %s",
('123',),
consistency_level=ConsistencyLevel.ONE
)
Use when: Social media feeds, user sessions, analytics, shopping carts
Cassandra
DynamoDB
CouchDB
CAP in Practice: System Comparison
| Database |
Default |
Configurable |
Notes |
| MongoDB |
CP |
Yes |
Write concern adjustable |
| Cassandra |
AP |
Yes |
Tunable consistency per query |
| DynamoDB |
AP |
Yes |
Strong consistency option available |
| Redis Cluster |
CP |
Limited |
Async replication with CP failover |
| CockroachDB |
CP |
No |
Serializable isolation always |
| PostgreSQL |
CA |
N/A |
Single node, no partition tolerance |
Consistency Models
Consistency models define the guarantees a distributed system provides about the order and visibility of operations across nodes.
Strong Consistency
Linearizability
The strongest consistency model. Operations appear instantaneous and in a global order that respects real-time.
# Linearizability: Operations have a single, global order
# Once a write completes, all subsequent reads see it
Timeline:
Client A: |--Write(X=1)--|
Client B: |--Read(X)--| # Must see X=1
Client C: |--Read(X)--| # May see X=0 or X=1
# (read started before write completed)
# Linearizable systems guarantee:
# 1. After write returns, all reads see new value
# 2. If read returns new value, all subsequent reads do too
# 3. Operations are totally ordered
# Implementation typically requires:
# - Consensus protocol (Raft, Paxos)
# - Quorum reads and writes
# - Coordination overhead (higher latency)
Sequential Consistency
Operations appear in a global order consistent with each client's program order, but not necessarily real-time order.
# Sequential Consistency Example
# All clients see same order, but may not match wall-clock time
# Client A operations:
A1: Write(X=1)
A2: Write(Y=1)
# Client B operations:
B1: Read(Y) -> 1
B2: Read(X) -> 0 # This is VALID in sequential consistency!
# Valid global ordering: A1, B2, B1, A2
# - Respects A's order (A1 before A2)
# - Respects B's order (B1 before B2)
# - But doesn't respect real-time (A2 "happened before" B1)
# Sequential consistency is weaker than linearizability
# because it doesn't preserve real-time ordering
Eventual Consistency
If no new updates are made, eventually all replicas will converge to the same value. The "eventually" can range from milliseconds to hours.
Eventual Consistency in Action
# Eventual Consistency Example: DNS
# When you update a DNS record, it propagates gradually
# T=0: Update DNS record
update_dns("example.com", "1.2.3.4", "5.6.7.8")
# T=1s: Server 1 sees new IP (5.6.7.8)
# T=5s: Server 2 still has old IP (1.2.3.4)
# T=30s: Server 3 gets update (5.6.7.8)
# T=60s: All servers consistent (5.6.7.8)
# Another example: Shopping cart in DynamoDB
class ShoppingCart:
def add_item(self, user_id, item):
# Write to nearest region (low latency)
self.db.put_item(
TableName='carts',
Item={
'user_id': user_id,
'items': self.get_items(user_id) + [item],
'updated_at': time.time()
}
)
# Replication to other regions happens async
# User in another region may see stale cart briefly
def get_items(self, user_id):
# Eventually consistent read (fast, cheap)
response = self.db.get_item(
TableName='carts',
Key={'user_id': user_id},
ConsistentRead=False # Eventually consistent
)
return response.get('Item', {}).get('items', [])
Tunable Consistency (Cassandra/DynamoDB)
# Cassandra Tunable Consistency
# N = Total replicas, W = Write acknowledgments, R = Read acknowledgments
# Strong consistency: W + R > N
# Example: N=3, W=2, R=2 (quorum)
# At least one node in read overlaps with write
# Eventual consistency: W + R <= N
# Example: N=3, W=1, R=1
# Reads may miss recent writes
# Common configurations:
# QUORUM: W=N/2+1, R=N/2+1 (strong, balanced)
# ONE/ONE: W=1, R=1 (fast, eventual)
# ALL/ONE: W=N, R=1 (write slow, read fast, strong)
# ONE/ALL: W=1, R=N (write fast, read slow, strong)
PACELC Extension: During Partition: choose Availability or Consistency. Else: choose Latency or Consistency. Even without partitions, there's a latency-consistency trade-off.
Consensus Algorithms
Consensus algorithms enable distributed nodes to agree on a single value even when some nodes fail. They are the foundation of consistent distributed systems.
Raft Consensus
Raft is designed to be understandable. It uses leader election and log replication.
# Raft Consensus Overview
# Node States
class NodeState:
FOLLOWER = "follower" # Default state, follows leader
CANDIDATE = "candidate" # Seeking to become leader
LEADER = "leader" # Accepts writes, replicates to followers
# Key Components:
# 1. Leader Election
# 2. Log Replication
# 3. Safety
class RaftNode:
def __init__(self, node_id, peers):
self.node_id = node_id
self.peers = peers
self.state = NodeState.FOLLOWER
self.current_term = 0
self.voted_for = None
self.log = [] # List of (term, command) entries
self.commit_index = 0
self.last_applied = 0
self.election_timeout = random.uniform(150, 300) # ms
def start_election(self):
"""Follower timeout -> become candidate"""
self.state = NodeState.CANDIDATE
self.current_term += 1
self.voted_for = self.node_id
votes = 1 # Vote for self
# Request votes from all peers
for peer in self.peers:
response = peer.request_vote(
term=self.current_term,
candidate_id=self.node_id,
last_log_index=len(self.log) - 1,
last_log_term=self.log[-1][0] if self.log else 0
)
if response.vote_granted:
votes += 1
# Majority wins
if votes > len(self.peers) // 2:
self.become_leader()
def become_leader(self):
"""Won election -> start leading"""
self.state = NodeState.LEADER
# Send heartbeats to prevent new elections
self.send_heartbeats()
def append_entries(self, command):
"""Leader receives client command"""
if self.state != NodeState.LEADER:
return redirect_to_leader()
# Append to local log
self.log.append((self.current_term, command))
# Replicate to followers
success_count = 1
for peer in self.peers:
if peer.replicate_log(self.log):
success_count += 1
# Commit when majority has entry
if success_count > len(self.peers) // 2:
self.commit_index = len(self.log) - 1
self.apply_command(command)
return success
etcd
Consul
CockroachDB
Paxos Consensus
Paxos is the original consensus algorithm. More complex but more flexible than Raft.
# Paxos Basic Protocol (Single-Decree)
# Three roles: Proposers, Acceptors, Learners
# Phase 1: Prepare
class Proposer:
def prepare(self, proposal_number):
"""Send prepare request to acceptors"""
promises = []
for acceptor in self.acceptors:
promise = acceptor.receive_prepare(proposal_number)
if promise:
promises.append(promise)
# Need majority of promises to proceed
if len(promises) > len(self.acceptors) // 2:
return self.phase2(proposal_number, promises)
class Acceptor:
def __init__(self):
self.promised_number = 0
self.accepted_number = 0
self.accepted_value = None
def receive_prepare(self, proposal_number):
"""Promise not to accept lower-numbered proposals"""
if proposal_number > self.promised_number:
self.promised_number = proposal_number
return Promise(
accepted_number=self.accepted_number,
accepted_value=self.accepted_value
)
return None # Reject
# Phase 2: Accept
class Proposer:
def phase2(self, proposal_number, promises):
"""Send accept request with value"""
# Use highest-numbered accepted value, or propose own
value = self.choose_value(promises)
accepts = 0
for acceptor in self.acceptors:
if acceptor.receive_accept(proposal_number, value):
accepts += 1
if accepts > len(self.acceptors) // 2:
# Value is chosen! Notify learners
self.notify_learners(value)
class Acceptor:
def receive_accept(self, proposal_number, value):
"""Accept if we haven't promised higher"""
if proposal_number >= self.promised_number:
self.promised_number = proposal_number
self.accepted_number = proposal_number
self.accepted_value = value
return True
return False
Chubby
Spanner
Consensus Algorithm Comparison
| Algorithm |
Complexity |
Performance |
Used By |
| Raft |
Understandable |
Good |
etcd, Consul, TiKV |
| Paxos |
Complex |
Good |
Chubby, Spanner |
| ZAB |
Moderate |
Good |
ZooKeeper |
| PBFT |
High |
Lower |
Blockchain (Hyperledger) |
Next Steps
Continue the Series
Part 7: Message Queues & Event-Driven
Review message queues and event-driven architecture patterns.
Read Article
Part 9: Rate Limiting & Security
Learn rate limiting algorithms and security best practices.
Read Article
Part 10: Monitoring & Observability
Master monitoring, logging, and distributed tracing.
Read Article