Event-Driven & Data Architecture — Systems Thinking & Architecture Mastery Part 7

Module 10: Event-Driven Architecture

In a synchronous world, services call each other directly — Service A sends a request to Service B and waits for a response. This creates temporal coupling (both must be alive at the same time) and behavioral coupling (A must know B's API). Event-driven architecture eliminates both.

Instead of "Service A calls Service B," the model becomes: "Service A publishes an event. Anyone interested can react." The producer doesn't know — or care — who's listening. This is the fundamental shift that enables truly decoupled, scalable systems.

Event Fundamentals: Producers, Consumers, Brokers

An event-driven system has three core participants:

Producers — Emit events when something interesting happens ("OrderPlaced", "PaymentProcessed", "InventoryLow")
Consumers — Subscribe to events they care about and react accordingly
Brokers — The intermediary that receives events from producers, stores them durably, and delivers them to consumers

                            
                            Key Insight: Events are facts — immutable records of something that happened. They use past tense ("OrderPlaced" not "PlaceOrder") because they describe history, not commands. This distinction matters: commands can be rejected; events cannot.
                        

Here's a well-structured event schema that captures intent clearly:

{
  "eventId": "evt_a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "eventType": "OrderPlaced",
  "aggregateId": "order_12345",
  "aggregateType": "Order",
  "timestamp": "2026-05-15T14:30:00.000Z",
  "version": 1,
  "metadata": {
    "correlationId": "req_xyz789",
    "causationId": "cmd_abc123",
    "userId": "user_456",
    "source": "order-service"
  },
  "payload": {
    "orderId": "order_12345",
    "customerId": "cust_789",
    "items": [
      { "productId": "prod_001", "quantity": 2, "unitPrice": 29.99 }
    ],
    "totalAmount": 59.98,
    "currency": "USD"
  }
}

Notice the correlationId and causationId — these enable distributed tracing across async boundaries. Every event carries the context of why it happened, not just what happened.

Event Sourcing: Storing Events, Not State

Traditional systems store current state — the latest snapshot of an entity. Event sourcing inverts this: you store the sequence of events that led to the current state. The current state is a derived view, computed by replaying events.

Event Sourcing — State Derived from Event Stream

flowchart LR
    subgraph EVENTS["Event Store (Immutable Log)"]
        E1["OrderCreated
$0.00"]
        E2["ItemAdded
+$29.99"]
        E3["ItemAdded
+$19.99"]
        E4["DiscountApplied
-$5.00"]
        E5["OrderConfirmed
$44.98"]
    end

    subgraph STATE["Current State (Derived)"]
        S1["Order #12345
Status: Confirmed
Total: $44.98
Items: 2"]
    end

    E1 --> E2 --> E3 --> E4 --> E5
    E5 -->|"Replay"| S1

    style E1 fill:#e8f4f4,stroke:#3B9797,color:#132440
    style E2 fill:#e8f4f4,stroke:#3B9797,color:#132440
    style E3 fill:#e8f4f4,stroke:#3B9797,color:#132440
    style E4 fill:#e8f4f4,stroke:#3B9797,color:#132440
    style E5 fill:#e8f4f4,stroke:#3B9797,color:#132440
    style S1 fill:#f0f4f8,stroke:#16476A,color:#132440

Why store events instead of state?

Complete audit trail — You can answer "what happened and when" for any entity at any point in time
Temporal queries — "What was the order state at 2:30 PM?" Replay events up to that timestamp
Event replay — Fix a bug in your projection logic, replay all events, get corrected state
Decoupled projections — Different services can build different read models from the same event stream

The tradeoffs:

Event schemas become your API contract — changing them requires careful versioning
Replaying millions of events is slow — you need snapshots (periodic state captures) for performance
Querying is complex — you can't "SELECT * WHERE status = 'confirmed'" against an event store

CQRS: Separating Reads from Writes

Command Query Responsibility Segregation (CQRS) separates the write model (commands that change state) from the read model (queries that return data). This isn't just architectural tidiness — it solves a fundamental scaling mismatch.

CQRS — Independent Read and Write Paths

flowchart TD
    CLIENT["Client Application"]

    subgraph WRITE["Write Side (Commands)"]
        CMD["Command Handler"]
        AGG["Domain Aggregate"]
        ES["Event Store"]
    end

    subgraph READ["Read Side (Queries)"]
        PROJ["Projection Engine"]
        RM1["Read Model A
(SQL - List View)"]
        RM2["Read Model B
(Elasticsearch - Search)"]
        RM3["Read Model C
(Redis - Dashboard)"]
    end

    CLIENT -->|"PlaceOrder"| CMD
    CMD --> AGG
    AGG -->|"OrderPlaced"| ES
    ES -->|"Publish"| PROJ
    PROJ --> RM1
    PROJ --> RM2
    PROJ --> RM3
    CLIENT -->|"GetOrders"| RM1
    CLIENT -->|"SearchOrders"| RM2
    CLIENT -->|"GetMetrics"| RM3

    style CMD fill:#e8f4f4,stroke:#3B9797,color:#132440
    style AGG fill:#e8f4f4,stroke:#3B9797,color:#132440
    style ES fill:#e8f4f4,stroke:#3B9797,color:#132440
    style PROJ fill:#f0f4f8,stroke:#16476A,color:#132440
    style RM1 fill:#f0f4f8,stroke:#16476A,color:#132440
    style RM2 fill:#f0f4f8,stroke:#16476A,color:#132440
    style RM3 fill:#f0f4f8,stroke:#16476A,color:#132440

In most systems, reads outnumber writes by 100:1 or more. CQRS lets you:

Scale reads and writes independently — 50 read replicas, one write master
Optimize each side differently — Normalize writes for consistency, denormalize reads for speed
Use different storage engines — Writes to PostgreSQL, reads from Elasticsearch and Redis
Evolve read models without touching business logic — Add a new read model by simply processing existing events

                            
                            Critical Tradeoff: CQRS introduces eventual consistency between write and read sides. After a command succeeds, the read model may take milliseconds to seconds to update. Your UI must handle "I just placed an order but I don't see it yet" scenarios gracefully.
                        

Apache Kafka: The Distributed Event Platform

Kafka isn't just a message queue — it's a distributed commit log. Understanding its architecture explains why it scales to millions of events per second while maintaining strict ordering guarantees.

Kafka Partition Architecture — Parallelism Through Partitioning

flowchart TD
    subgraph TOPIC["Topic: order-events"]
        subgraph P0["Partition 0"]
            P0E1["Offset 0: OrderPlaced"]
            P0E2["Offset 1: OrderConfirmed"]
            P0E3["Offset 2: OrderShipped"]
        end
        subgraph P1["Partition 1"]
            P1E1["Offset 0: OrderPlaced"]
            P1E2["Offset 1: OrderCancelled"]
        end
        subgraph P2["Partition 2"]
            P2E1["Offset 0: OrderPlaced"]
            P2E2["Offset 1: OrderPlaced"]
            P2E3["Offset 2: OrderConfirmed"]
        end
    end

    subgraph CG["Consumer Group: fulfillment"]
        C1["Consumer 1"]
        C2["Consumer 2"]
        C3["Consumer 3"]
    end

    P0 --> C1
    P1 --> C2
    P2 --> C3

    style P0E1 fill:#e8f4f4,stroke:#3B9797,color:#132440
    style P0E2 fill:#e8f4f4,stroke:#3B9797,color:#132440
    style P0E3 fill:#e8f4f4,stroke:#3B9797,color:#132440
    style P1E1 fill:#e8f4f4,stroke:#3B9797,color:#132440
    style P1E2 fill:#e8f4f4,stroke:#3B9797,color:#132440
    style P2E1 fill:#e8f4f4,stroke:#3B9797,color:#132440
    style P2E2 fill:#e8f4f4,stroke:#3B9797,color:#132440
    style P2E3 fill:#e8f4f4,stroke:#3B9797,color:#132440
    style C1 fill:#f0f4f8,stroke:#16476A,color:#132440
    style C2 fill:#f0f4f8,stroke:#16476A,color:#132440
    style C3 fill:#f0f4f8,stroke:#16476A,color:#132440

Core Kafka concepts:

Topics — Named categories of events (like database tables for events)
Partitions — Each topic splits into partitions for parallelism. Events with the same key always go to the same partition (guaranteeing order for that key)
Consumer Groups — Multiple consumers share partitions. Each partition is read by exactly one consumer in a group — this enables horizontal scaling
Offsets — Each event in a partition has a sequential offset. Consumers track their position — they can replay from any offset
Retention — Kafka retains events for a configurable period (days, weeks, forever). Unlike traditional queues, reading doesn't delete the message

A typical Kafka topic configuration:

# Kafka topic configuration for order events
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: order-events
  labels:
    strimzi.io/cluster: production-cluster
spec:
  partitions: 12
  replicas: 3
  config:
    retention.ms: 604800000        # 7 days retention
    cleanup.policy: delete
    min.insync.replicas: 2         # At least 2 replicas must ACK
    compression.type: lz4          # Compress for network efficiency
    max.message.bytes: 1048576     # 1MB max event size
    segment.bytes: 1073741824      # 1GB log segment files

And a Python producer that publishes order events with proper error handling:

from confluent_kafka import Producer
import json
import uuid
from datetime import datetime, timezone

# Configure Kafka producer with reliability settings
producer_config = {
    'bootstrap.servers': 'kafka-broker-1:9092,kafka-broker-2:9092',
    'acks': 'all',                    # Wait for all replicas to ACK
    'retries': 3,                     # Retry transient failures
    'retry.backoff.ms': 100,
    'enable.idempotence': True,       # Exactly-once semantics
    'max.in.flight.requests.per.connection': 5,
    'compression.type': 'lz4',
    'linger.ms': 5,                   # Batch small messages
}

producer = Producer(producer_config)

def publish_order_event(order_id, customer_id, items, total):
    """Publish an OrderPlaced event to Kafka with idempotent delivery."""
    event = {
        'eventId': str(uuid.uuid4()),
        'eventType': 'OrderPlaced',
        'aggregateId': order_id,
        'timestamp': datetime.now(timezone.utc).isoformat(),
        'payload': {
            'orderId': order_id,
            'customerId': customer_id,
            'items': items,
            'totalAmount': total,
        }
    }

    # Use order_id as key — ensures all events for same order
    # go to same partition (preserving ordering per order)
    producer.produce(
        topic='order-events',
        key=order_id.encode('utf-8'),
        value=json.dumps(event).encode('utf-8'),
        callback=delivery_callback
    )
    producer.flush()

def delivery_callback(err, msg):
    if err:
        print(f"Delivery failed: {err}")
    else:
        print(f"Delivered to {msg.topic()}[{msg.partition()}] @ offset {msg.offset()}")

# Example usage
publish_order_event(
    order_id='order_12345',
    customer_id='cust_789',
    items=[{'productId': 'prod_001', 'quantity': 2, 'unitPrice': 29.99}],
    total=59.98
)
print("Event published successfully")

Key Kafka commands for operations:

# Create a topic with 12 partitions and replication factor 3
kafka-topics.sh --bootstrap-server localhost:9092 \
  --create --topic order-events \
  --partitions 12 --replication-factor 3

# Describe topic configuration and partition assignments
kafka-topics.sh --bootstrap-server localhost:9092 \
  --describe --topic order-events

# Check consumer group lag (how far behind consumers are)
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
  --group fulfillment-service --describe

# Read events from beginning (useful for debugging)
kafka-console-consumer.sh --bootstrap-server localhost:9092 \
  --topic order-events --from-beginning --max-messages 10

# Reset consumer group offset to replay events
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
  --group fulfillment-service --topic order-events \
  --reset-offsets --to-earliest --execute

RabbitMQ & NATS: Alternative Broker Patterns

Kafka excels at high-throughput event streaming with durable storage. But not every system needs a distributed commit log. RabbitMQ and NATS serve different use cases:

Broker Comparison

When to Use Each Broker

Criteria	Kafka	RabbitMQ	NATS
Model	Distributed log	Smart broker / dumb consumer	At-most-once pub/sub
Ordering	Per-partition	Per-queue	None (best effort)
Throughput	Millions/sec	Tens of thousands/sec	Millions/sec
Retention	Configurable (days–forever)	Until consumed	JetStream for persistence
Best For	Event sourcing, analytics pipelines	Task queues, routing, RPC	IoT, edge computing, microservices

Kafka RabbitMQ NATS

RabbitMQ Exchange Types provide sophisticated routing:

Direct — Route by exact routing key match (e.g., "payment.success" → payment queue)
Fanout — Broadcast to all bound queues (e.g., order event → notification + analytics + audit)
Topic — Pattern matching with wildcards (e.g., "order.*.failed" matches "order.payment.failed")
Headers — Route by message header attributes (rarely used)

Message Ordering & Idempotency

Two problems that plague every event-driven system:

Ordering: In Kafka, order is guaranteed within a partition. If you need total ordering for an entity (all events for order #123 in sequence), use the entity ID as the partition key. Across partitions, there's no ordering guarantee — design for this.

Idempotency: Messages can be delivered more than once (network retries, consumer crashes before committing offset). Every consumer must handle duplicates safely:

import hashlib
import redis

# Redis-based idempotency check for event consumers
redis_client = redis.Redis(host='localhost', port=6379, db=0)

def process_event_idempotently(event):
    """Process event exactly once using deduplication with Redis."""
    event_id = event['eventId']
    dedup_key = f"processed:{event_id}"

    # Check if already processed (atomic SET NX with TTL)
    already_processed = not redis_client.set(
        dedup_key, '1', nx=True, ex=86400  # 24h TTL
    )

    if already_processed:
        print(f"Skipping duplicate event: {event_id}")
        return

    # Process the event (business logic here)
    if event['eventType'] == 'OrderPlaced':
        order = event['payload']
        print(f"Processing order {order['orderId']} "
              f"for customer {order['customerId']}")
        # ... fulfill order logic ...

    print(f"Successfully processed event: {event_id}")

# Example: simulate processing with deduplication
sample_event = {
    'eventId': 'evt_abc123',
    'eventType': 'OrderPlaced',
    'payload': {'orderId': 'order_001', 'customerId': 'cust_456'}
}
process_event_idempotently(sample_event)  # Processes
process_event_idempotently(sample_event)  # Skips (duplicate)

                            
                            Design Rule: Always design consumers to be idempotent. Use the event ID (or a hash of the event content) as a deduplication key. Store processed IDs with a TTL that exceeds your maximum replay window.
                        

Module 11: Data Architecture

Once you move beyond a single database server, every operation involves tradeoffs. You can't have instant consistency, high availability, and network partition tolerance simultaneously. Understanding these tradeoffs — and knowing which ones your system can tolerate — is the core skill of data architecture.

Replication Strategies

Replication serves two purposes: fault tolerance (survive node failures) and read scaling (spread read load across replicas). The strategy you choose determines your consistency guarantees.

Leader-Follower (Single-Leader) Replication:

One node accepts writes (leader); followers replicate asynchronously or synchronously
Synchronous: follower confirms before client gets ACK — strong consistency but higher latency
Asynchronous: leader ACKs immediately, followers catch up eventually — lower latency but risk of data loss if leader crashes
Used by: PostgreSQL, MySQL, MongoDB (primary/secondary)

Multi-Leader Replication:

Multiple nodes accept writes — useful for multi-datacenter deployments
Conflict resolution required: last-write-wins (LWW), merge, or custom resolution
Used by: CockroachDB, Google Spanner, Cassandra (any node writes)

Leaderless Replication:

Any node accepts reads and writes. Quorum-based: W + R > N ensures consistency
Write to W of N nodes, read from R of N nodes — if W + R > N, at least one node has the latest value
Used by: Cassandra, DynamoDB, Riak

                            
                            Quorum Math: With N=3 replicas, W=2 (write to 2), R=2 (read from 2): W + R = 4 > 3 = N. At least one node in every read overlaps with every write — guaranteeing you see the latest value. Trade: higher latency per operation, but strong consistency without a single leader.
                        

Sharding & Partitioning

When a single node can't hold all your data or handle all your traffic, you partition (shard) data across multiple nodes. Each shard holds a subset of the data. The sharding strategy determines data distribution and query efficiency.

Range-Based Sharding:

Assign key ranges to shards (e.g., users A–M → shard 1, N–Z → shard 2)
Pro: Range queries are efficient (find all users between "Smith" and "Taylor")
Con: Hot spots if data isn't uniformly distributed (most users start with "S")

Hash-Based Sharding:

Hash the key, mod by number of shards: shard = hash(key) % num_shards
Pro: Even distribution regardless of key patterns
Con: Range queries require scatter-gather across all shards
Con: Adding shards requires rehashing (consistent hashing mitigates this)

Directory-Based Sharding:

A lookup service maps keys to shards. Most flexible but adds a single point of failure and latency
Used when sharding logic is complex or data needs manual placement (e.g., geographic compliance)

CAP Theorem: The Fundamental Tradeoff

The CAP theorem (Brewer, 2000) states: in a distributed system experiencing a network partition, you must choose between Consistency (every read returns the most recent write) and Availability (every request receives a response). You cannot have both during a partition.

CAP Theorem — Pick Two During Network Partitions

flowchart TD
    CAP["CAP Theorem"]

    C["Consistency
Every read sees latest write"]
    A["Availability
Every request gets a response"]
    P["Partition Tolerance
System works despite network splits"]

    CAP --> C
    CAP --> A
    CAP --> P

    CP["CP Systems
etcd, ZooKeeper, HBase
MongoDB (default)"]
    AP["AP Systems
Cassandra, DynamoDB
CouchDB, Riak"]

    C ---|"+ P"| CP
    A ---|"+ P"| AP

    style C fill:#e8f4f4,stroke:#3B9797,color:#132440
    style A fill:#f0f4f8,stroke:#16476A,color:#132440
    style P fill:#fdf0f0,stroke:#BF092F,color:#132440
    style CP fill:#e8f4f4,stroke:#3B9797,color:#132440
    style AP fill:#f0f4f8,stroke:#16476A,color:#132440

Key clarification: CAP isn't "pick 2 of 3" — it's "during a partition (P is forced), choose C or A." When there's no partition, you can have both consistency and availability. The question is: what happens when the network splits?

Real-World Examples

CP vs AP in Production

CP (Consistency + Partition Tolerance):

etcd / ZooKeeper: Used for service discovery and leader election. During a partition, minority nodes stop serving reads/writes rather than risk serving stale data. Correctness > availability.
Google Spanner: Globally consistent via TrueTime. Transactions may block during partitions. Chosen for financial systems where inconsistency = money loss.

AP (Availability + Partition Tolerance):

DynamoDB: Every request gets a response, even during partitions. Eventual consistency — reads may return stale data. Chosen for shopping carts where "add to cart always works" beats "cart is perfectly consistent."
Cassandra: Tunable consistency (ONE, QUORUM, ALL). At QUORUM, it's CP-ish; at ONE, it's fully AP. Flexibility per query.

CAP Distributed Systems Consistency

PACELC: Beyond CAP

CAP only describes behavior during partitions. PACELC extends this: "If there's a Partition, choose Availability or Consistency. Else (normal operation), choose Latency or Consistency."

This captures the latency-consistency tradeoff that exists even when the network is healthy:

PA/EL (DynamoDB, Cassandra at ONE): During partition → available. Normal operation → low latency (async replication)
PC/EC (etcd, ZooKeeper): During partition → consistent. Normal operation → consistent (synchronous replication, higher latency)
PA/EC (rare): Available during partitions, but consistent during normal ops. Possible with sync replication + failover-to-AP

                            
                            Architecture Decision: Most real systems use different consistency levels for different operations. A banking app might use PC/EC for balance transfers but PA/EL for transaction history reads. Map your data paths to their consistency requirements individually.
                        

Case Studies

LinkedIn: The Birth of Kafka

Case Study 2011 – Present

LinkedIn's Data Pipeline Problem

By 2010, LinkedIn had dozens of services that needed to share data: the activity feed consumed events from connections, messaging, job applications, and endorsements. They tried point-to-point integrations — each service calling each other service — and ended up with an O(N²) coupling nightmare.

The solution: Jay Kreps, Neha Narkhede, and Jun Rao built Kafka as an internal platform (2011). Instead of N² connections, every service publishes to Kafka and consumes from Kafka. The coupling became O(N) — each service has one integration point.

Scale today: LinkedIn processes over 7 trillion messages per day through Kafka. Every action — profile view, connection request, message sent — becomes an event that feeds dozens of downstream systems: search indexing, recommendation engines, analytics, compliance, and the activity feed itself.

Key architectural decisions:

Append-only log (not destructive read) — multiple consumers read the same stream independently
Partition-based parallelism — scale consumers by adding partitions
Retention-based storage — keep events for days/weeks, enabling replay and reprocessing

Kafka Event Streaming LinkedIn

DynamoDB: Choosing Availability Over Consistency

Case Study 2007 – Present

Amazon's Dynamo Paper (2007)

During the 2004 holiday season, Amazon's shopping cart service experienced availability issues. The engineering team asked: "Is it worse if a customer sees a stale cart (missing a recently added item) or if the cart is completely unavailable?" The answer was clear — an available but slightly stale cart is better than a 500 error.

Design choices that followed:

AP over CP: The system always accepts writes, even during network partitions. Conflicting writes are resolved later using vector clocks and application-level merging.
Consistent hashing: Data distributed across a ring of nodes. Adding/removing nodes moves minimal data.
Sloppy quorums: If the designated node is unreachable, write to a "hint" node temporarily. Reconcile when the original node recovers.
Anti-entropy with Merkle trees: Background process compares data between replicas using hash trees, identifying and repairing inconsistencies.

Impact: The Dynamo paper influenced Cassandra (Facebook), Riak (Basho), and Voldemort (LinkedIn). DynamoDB (the managed AWS service) builds on these principles, serving single-digit millisecond latency at any scale.

DynamoDB AP Systems Amazon

Conclusion & Next Steps

Modules 10 and 11 covered the two pillars of distributed data systems: how data moves (event-driven architecture) and how data persists (distributed data architecture).

The key takeaways:

Event sourcing stores history, not snapshots. You gain auditability, temporal queries, and replay capability — at the cost of query complexity and schema evolution challenges.
CQRS separates concerns that have different scaling profiles. Writes need consistency; reads need speed. Optimize each independently, but accept eventual consistency between them.
Kafka is a distributed commit log, not a message queue. Its partition model enables both ordering guarantees (per key) and horizontal scaling (more partitions = more consumers).
Idempotency is non-negotiable. In any system where messages can be delivered more than once (which is every distributed system), consumers must safely handle duplicates.
CAP theorem forces a choice during partitions. Most systems choose AP (availability) for user-facing paths and CP (consistency) for coordination/metadata paths.
PACELC reveals the latency-consistency tradeoff that exists even during normal operation. Synchronous replication = consistent + slow. Asynchronous = fast + eventually consistent.

Next in the Series

In Part 8: API & Cloud-Native Architecture, we'll explore API design styles (REST, gRPC, GraphQL), gateway patterns, and the principles of cloud-native architecture — immutability, elasticity, and the Twelve-Factor methodology that underpins modern deployments.

Previous Part 6: Microservices Architecture Next Part 8: API & Cloud-Native Architecture

Cookie Consent

Part 7: Event-Driven & Data Architecture

Table of Contents

Module 10: Event-Driven Architecture

Event Fundamentals: Producers, Consumers, Brokers

Event Sourcing: Storing Events, Not State

CQRS: Separating Reads from Writes

Apache Kafka: The Distributed Event Platform

RabbitMQ & NATS: Alternative Broker Patterns

When to Use Each Broker

Message Ordering & Idempotency

Module 11: Data Architecture

Replication Strategies

Sharding & Partitioning

CAP Theorem: The Fundamental Tradeoff

CP vs AP in Production

PACELC: Beyond CAP

Case Studies

LinkedIn: The Birth of Kafka

LinkedIn's Data Pipeline Problem

DynamoDB: Choosing Availability Over Consistency

Amazon's Dynamo Paper (2007)

Conclusion & Next Steps

Next in the Series

Cookie Consent

Part 7: Event-Driven & Data Architecture

Table of Contents

Module 10: Event-Driven Architecture

Event Fundamentals: Producers, Consumers, Brokers

Event Sourcing: Storing Events, Not State

CQRS: Separating Reads from Writes

Apache Kafka: The Distributed Event Platform

RabbitMQ & NATS: Alternative Broker Patterns

When to Use Each Broker

Message Ordering & Idempotency

Module 11: Data Architecture

Replication Strategies

Sharding & Partitioning

CAP Theorem: The Fundamental Tradeoff

CP vs AP in Production

PACELC: Beyond CAP

Case Studies

LinkedIn: The Birth of Kafka

LinkedIn's Data Pipeline Problem

DynamoDB: Choosing Availability Over Consistency

Amazon's Dynamo Paper (2007)

Conclusion & Next Steps

Next in the Series

Related Articles in This Series

Part 6: Microservices Architecture

Part 8: API & Cloud-Native Architecture

Part 9: Scalability Fundamentals