Part 6: Software Architecture & Design Patterns

Introduction

Software architecture is the skeleton of a system. Just as a building's architecture determines whether it can support ten floors or fifty, software architecture determines whether a system can serve ten users or ten million — whether a team of five can evolve it or whether it will require two hundred engineers to keep it alive.

Ralph Johnson defined architecture as "the decisions that are hard to change." Martin Fowler extended this: architecture is the shared understanding that the expert developers in a project have of the system design. Both definitions point to the same truth — architecture is about the significant decisions, the ones that constrain everything else.

In this article, we explore what software architecture actually is, how it differs from design, and then dive deep into the major architectural patterns you will encounter in real systems. We finish with a practical tool — the Architectural Decision Record — that helps teams document and communicate these critical choices.

                            
                            Key Insight: Architecture is not about making every decision upfront. It is about making the right decisions at the right time and ensuring those decisions are reversible where possible. The best architects defer decisions until the last responsible moment — when they have the most information.
                        

Why Architecture Matters for Delivery

Architecture directly determines three delivery properties:

Deployment units: How many independent pieces can be deployed separately? A monolith is one deployment unit. A microservices system may have hundreds.
Team boundaries: Conway's Law states that organisations design systems that mirror their communication structures. Architecture defines where teams can work independently without blocking each other.
Scalability constraints: A single-process monolith must scale vertically (bigger machines). A distributed system can scale horizontally (more machines). Architecture determines which scaling path is available.

Poor architecture choices early in a project create accidental complexity that compounds over time. Every feature takes longer, every change risks breaking something else, and eventually the system becomes so rigid that a rewrite is the only option.

Architecture vs Design

These terms are often used interchangeably, but they describe different levels of abstraction:

Dimension	Architecture	Design
Scope	System-wide structure and boundaries	Within a single component or module
Concerns	Quality attributes, deployment topology, communication patterns	Classes, interfaces, algorithms, data structures
Change Cost	High — affects multiple teams, services, infrastructure	Lower — contained within a module
Decided By	Senior architects, tech leads, cross-team decisions	Individual developers within their component
Examples	"We use event-driven communication between services"	"This class uses the Strategy pattern for payment processing"
Documentation	Architecture Decision Records, C4 diagrams	Code itself, UML class diagrams, inline comments

What Makes Something Architectural?

A decision is architectural if it satisfies one or more of these criteria:

It is hard to reverse — Changing it would require significant rework across the system
It constrains other decisions — Once chosen, it limits what designs are possible within components
It affects multiple stakeholders — Development teams, operations, security, and business all have opinions
It involves tradeoffs between quality attributes — You cannot optimise for everything simultaneously

                            
                            Grady Booch's Definition: "Architecture represents the significant design decisions that shape a system, where significance is measured by cost of change." This is why architecture reviews exist — to catch expensive mistakes before they are built.
                        

The Architecture Pattern Landscape

Most introductions to software architecture present a flat list of pattern names — Client-Server, Microservices, Event-Driven — without explaining what kind of problem each one solves. This is confusing because the list mixes architectural styles, deployment models, communication patterns, and domain organisation strategies.

A more useful mental model groups patterns by the dimension of the problem they address. The table below organises the full landscape into seven categories:

                            
                            The Six Dimensions of Architecture: Structure (how code is organised within a deployment unit) → Communication (how components exchange messages) → Deployment (how units are shipped and scaled) → Data (how state is stored and queried) → Domain (how business logic is bounded) → Resilience (how failures are contained). Every major pattern addresses one or more of these dimensions.
                        

Dimension	Primary Question	Key Patterns
Structure / Code Organisation	How is the codebase internally organised?	Layered, Hexagonal, Clean Architecture, Onion, MVC/MVP/MVVM
Communication / Interaction	How do components exchange messages?	Client-Server, Event-Driven, Pipe-and-Filter, Pub/Sub, Broker, Message Bus
Deployment / Topology	What are the physical deployment units?	Monolith, Modular Monolith, Microservices, SOA, Serverless, Cell-Based
Data / State Management	How is persistent state stored and accessed?	Primary-Replica, Sharding, CQRS, Event Sourcing, Lambda Architecture
Domain Organisation	How are business rules bounded and expressed in code?	DDD Bounded Contexts, Hexagonal, Onion, Clean Architecture, Saga
Integration / Distributed Coordination	How do distributed services coordinate?	API Gateway, BFF, Saga, Strangler Fig, Service Mesh, Sidecar
Resilience / Fault Tolerance	How does the system behave when dependencies fail?	Circuit Breaker, Bulkhead, Retry/Timeout, Cache-Aside

                            
                            The 20 Patterns Every Architect Should Know: Layered, Client-Server, MVC, Event-Driven, Pipe-and-Filter, Monolith, Modular Monolith, Microservices, SOA, Hexagonal, Clean Architecture, Onion, CQRS, Event Sourcing, Saga, API Gateway, Primary-Replica, Sharding, Pub/Sub, and Serverless. These cover roughly 90% of modern architecture discussions.
                        

Foundational Architectural Patterns

An architectural pattern is a reusable solution to a commonly occurring problem in system structure. Patterns are not prescriptions — they are options. The skill of architecture lies in knowing which pattern fits which context. The patterns below are the foundations every architect builds on.

Client-Server

The most fundamental distributed architecture pattern. A client sends requests; a server processes them and returns responses. The entire web is built on this pattern.

Client-Server Architecture

flowchart LR
    subgraph Clients
        A[Web Browser]
        B[Mobile App]
        C[CLI Tool]
    end
    subgraph Server
        D[Load Balancer]
        E[Application Server]
        F[Database]
    end
    A -->|HTTP Request| D
    B -->|REST API| D
    C -->|gRPC| D
    D --> E
    E --> F
    E -->|Response| D
    D -->|HTTP Response| A

Thin vs Thick Clients:

Thin client: Minimal logic in the client. The server does most processing. Example: traditional server-rendered web apps (Rails, Django).
Thick client: Significant logic in the client. The server provides APIs. Example: Single Page Applications (React, Angular), mobile apps.

When to use: Almost every web application, mobile backend, API service. It is the default starting point for most systems.

Tradeoffs: Simple to understand and deploy. Single point of failure at the server. Server must scale to handle all client load. Network latency affects every interaction.

Layered (N-Tier) Architecture

The layered pattern organises code into horizontal layers, each with a specific responsibility. Each layer only communicates with the layer directly below it (strict layering) or any layer below it (relaxed layering).

Layered (N-Tier) Architecture

flowchart TD
    A[Presentation Layer
UI, Controllers, Views] --> B[Business Logic Layer
Services, Domain Objects, Rules]
    B --> C[Data Access Layer
Repositories, ORMs, Queries]
    C --> D[Database Layer
PostgreSQL, MongoDB, Redis]

    style A fill:#3B9797,color:#fff
    style B fill:#16476A,color:#fff
    style C fill:#132440,color:#fff
    style D fill:#BF092F,color:#fff

Strict vs Relaxed Layering:

Strict: Layer N can only call Layer N-1. Forces all requests through every layer. Maximum separation but can create "pass-through" layers that add no value.
Relaxed: Layer N can call any layer below it. More flexible but creates hidden dependencies that make refactoring harder.

Common Layer Configurations:

3-tier: Presentation → Business → Data (most web applications)
4-tier: Presentation → Application → Domain → Infrastructure (Domain-Driven Design)
2-tier: Client → Database (simple desktop applications)

When to use: Business applications with clear separation of concerns. Teams that want predictable structure. Codebases where multiple developers work on different layers simultaneously.

Tradeoffs: Easy to understand and implement. Can become monolithic if all layers deploy together. Performance overhead from layer-to-layer calls. Risk of "architecture sinkhole" where layers just pass data through without transformation.

Case Study

The Architecture Sinkhole Anti-Pattern

A team building a financial reporting system implemented strict 4-tier architecture. They discovered that 80% of their requests simply passed data from the database through the Data Access Layer → Business Layer → Application Layer → Presentation Layer without any transformation. The "Business Logic" layer was just calling repository.findById(id) and returning the result unchanged. The solution: bypass layers when they add no value. Allow the Presentation layer to call the Data Access layer directly for simple read operations. This is the pragmatic reality of relaxed layering.

Anti-Pattern Layered Pragmatism

Pipe-and-Filter

Data flows through a chain of processing stages (filters), connected by channels (pipes). Each filter is independent — it receives input, transforms it, and produces output. The Unix command line is the canonical example: cat file.log | grep ERROR | sort | uniq -c | sort -rn.

Pipe-and-Filter Architecture — ETL Pipeline

flowchart LR
    A[Data Source
CSV Files] --> B[Extract
Parse & Validate]
    B --> C[Transform
Clean & Enrich]
    C --> D[Aggregate
Group & Sum]
    D --> E[Format
JSON Output]
    E --> F[Load
Data Warehouse]

    style A fill:#132440,color:#fff
    style B fill:#3B9797,color:#fff
    style C fill:#3B9797,color:#fff
    style D fill:#3B9797,color:#fff
    style E fill:#3B9797,color:#fff
    style F fill:#BF092F,color:#fff

Key Properties:

Composability: Filters can be rearranged, added, or removed without changing other filters
Reusability: A "validate email" filter can be used in multiple pipelines
Parallelism: Independent filters can run concurrently on different data chunks
Testability: Each filter can be tested in isolation with known input/output

When to use: Data processing pipelines (ETL), stream processing, compiler stages (lexing → parsing → semantic analysis → code generation), image processing, log analysis.

Tradeoffs: Excellent composability and testability. Not suitable for interactive applications. Overhead from serialisation/deserialisation between stages. Error handling across the pipeline is complex.

Event-Driven Architecture

Components communicate by producing and consuming events — records of something that happened. Producers do not know (or care) who consumes their events. Consumers do not know who produced the events. This creates extreme decoupling.

Event-Driven Architecture

flowchart TD
    subgraph Producers
        A[Order Service]
        B[Payment Service]
        C[Inventory Service]
    end
    subgraph Event Bus
        D[Message Broker
Kafka / RabbitMQ]
    end
    subgraph Consumers
        E[Email Service]
        F[Analytics Service]
        G[Audit Log Service]
        H[Shipping Service]
    end
    A -->|OrderPlaced| D
    B -->|PaymentProcessed| D
    C -->|StockUpdated| D
    D -->|OrderPlaced| E
    D -->|OrderPlaced| F
    D -->|PaymentProcessed| G
    D -->|OrderPlaced| H

Event Types:

Domain Events: "OrderPlaced", "UserRegistered" — business-meaningful things that happened
Integration Events: Events published for other services to consume across boundaries
Event Notifications: Thin events that say "something changed" — consumers must query for details
Event-Carried State Transfer: Fat events containing all the data consumers need — no callbacks required

Eventual Consistency: Because events are processed asynchronously, the system is eventually consistent — different services may have different views of the world for brief periods. This is the fundamental tradeoff of event-driven systems.

When to use: Systems requiring high decoupling between services. Scenarios where multiple consumers need to react to the same event. Systems where eventual consistency is acceptable. High-throughput systems (millions of events per second).

Tradeoffs: Maximum decoupling and scalability. Difficult to debug (no single call stack). Eventual consistency complicates business logic. Event schema evolution requires careful versioning.

                            
                            Warning: Event-driven architecture is powerful but introduces significant operational complexity. You need: a reliable message broker, dead-letter queues for failed messages, idempotent consumers (events may be delivered more than once), and observability tooling that can trace events across services. Do not adopt event-driven architecture unless you have the operational maturity to support it.
                        

Primary-Replica (Master-Slave)

One node (the primary) handles all write operations. Multiple replicas receive copies of the data and handle read operations. This separates read and write workloads, enabling horizontal scaling of reads.

Use cases:

Database replication: PostgreSQL primary with read replicas for reporting queries
Content distribution: Primary content server replicating to CDN edge nodes
High availability: If the primary fails, a replica is promoted (failover)

Consistency Models:

Synchronous replication: Primary waits for replicas to confirm before acknowledging writes. Strong consistency but higher latency.
Asynchronous replication: Primary acknowledges writes immediately, replicates in the background. Lower latency but risk of data loss on primary failure.
Semi-synchronous: Primary waits for at least one replica to confirm. Balance between consistency and performance.

When to use: Read-heavy workloads (90%+ reads), systems requiring high availability, scenarios where read scaling is more important than write scaling.

Tradeoffs: Excellent read scalability. Write bottleneck at primary. Replication lag creates potential for stale reads. Failover adds operational complexity.

Microservices

The system is decomposed into small, independently deployable services, each owning its own data and communicating via well-defined APIs or events. Each service is built, deployed, and scaled independently.

Key Properties:

Single Responsibility: Each service does one business capability well (User Service, Payment Service, Inventory Service)
Independent Deployment: Changing the Payment Service does not require redeploying the User Service
Polyglot Persistence: Each service chooses the best database for its needs (SQL, NoSQL, Graph, Time-series)
API Contracts: Services communicate through versioned APIs — internal implementation is hidden
Fault Isolation: If the Recommendation Service crashes, the core checkout flow still works

When to use: Large organisations (100+ engineers) where team autonomy is critical. Systems requiring different scaling characteristics for different components. When deployment independence is worth the operational overhead.

Tradeoffs: Maximum team autonomy and scaling flexibility. Massive operational complexity (service mesh, distributed tracing, API gateways). Network calls replace function calls (latency). Data consistency across services is fundamentally hard.

Case Study

Amazon's "Two-Pizza Teams" and Service-Oriented Architecture

In 2002, Jeff Bezos issued his famous mandate: all teams must communicate through service interfaces. No direct database access. No shared-memory models. Every team's service must be designed to be exposed externally. This forced Amazon to decompose their monolithic bookstore into hundreds of independent services — each owned by a "two-pizza team" (6-8 people). The result: Amazon could scale both their technology and their organisation. Each team could innovate independently, deploy multiple times per day, and choose their own technology stack. This became the blueprint for what we now call "microservices."

Microservices Conway's Law Team Autonomy

Monolithic Architecture

The entire application is a single deployment unit. All code runs in one process, shares one database, and is deployed together. Despite its reputation, the monolith is often the correct architectural choice — especially for new products, small teams, and systems where simplicity trumps flexibility.

When monoliths are correct:

Team size is small (< 20 engineers)
Domain boundaries are not yet clear (early product)
Deployment simplicity is valued over independent scaling
The system does not have drastically different scaling requirements for different components
You cannot afford the operational overhead of distributed systems (Kubernetes, service mesh, distributed tracing)

The Modular Monolith: A pragmatic middle ground. The codebase is structured into well-defined modules with clear boundaries and interfaces — but deployed as a single unit. Each module could theoretically become a microservice, but you defer that decision until it is actually needed. This preserves simplicity while maintaining clean architecture.

# Modular Monolith Directory Structure
src/
├── modules/
│   ├── users/
│   │   ├── api/          # Public interface (what other modules can call)
│   │   ├── domain/       # Business logic (private to this module)
│   │   ├── persistence/  # Database access (private)
│   │   └── tests/
│   ├── payments/
│   │   ├── api/
│   │   ├── domain/
│   │   ├── persistence/
│   │   └── tests/
│   └── inventory/
│       ├── api/
│       ├── domain/
│       ├── persistence/
│       └── tests/
├── shared/               # Cross-cutting concerns (logging, auth)
└── main.py               # Single entry point

                            
                            Key Insight: "If you can't build a well-structured monolith, what makes you think microservices is the answer?" — Simon Brown. Microservices don't fix bad design — they distribute it across the network where it becomes harder to debug.
                        

Domain-Centric Patterns

Domain-centric patterns place the business logic at the centre of the architecture, with infrastructure (databases, frameworks, HTTP) pushed to the outer edges. This inverts the traditional layered model, where the database is the foundation. The core principle: high-level policy should not depend on low-level detail — details should depend on policies.

Hexagonal Architecture (Ports & Adapters)

Proposed by Alistair Cockburn, Hexagonal Architecture organises a system into three zones: the Application Core (pure domain logic), Ports (interfaces the core exposes or consumes), and Adapters (concrete implementations — HTTP controllers, database repositories, message queue consumers). The core knows nothing about the outside world.

Hexagonal Architecture — Ports & Adapters

flowchart LR
    subgraph "Driving Side"
        A[REST Controller]
        B[CLI Command]
        C[Test Suite]
    end
    subgraph "Application Core"
        D[Driving Port
OrderService Interface]
        E[Domain Logic
Order · Payment · Inventory]
        F[Driven Port
Repository Interface]
    end
    subgraph "Driven Side"
        G[PostgreSQL Adapter]
        H[Kafka Adapter]
        I[Email Adapter]
    end
    A -->|HTTP Adapter| D
    B -->|CLI Adapter| D
    C -->|Test Adapter| D
    D --> E
    E --> F
    F --> G
    F --> H
    F --> I

The key rule: Dependencies always point inward. The REST controller knows the domain interface (Port) but nothing about domain internals. You can swap any adapter — replace PostgreSQL with MongoDB, REST with GraphQL — without touching business logic. This makes the core independently testable: run the full domain test suite without starting a web server or connecting to a database.

Driving vs Driven Ports:

Driving ports (left): Interfaces the outside world calls into your application. A REST controller is an adapter for a driving port.
Driven ports (right): Interfaces your application uses to call external systems. A repository interface is a driven port; PostgreSQL is its adapter.

When to use: Applications where business logic is complex and must be independently testable. Systems supporting multiple delivery mechanisms (REST + CLI + event consumers). Long-lived applications where infrastructure evolves.

Tradeoffs: Excellent testability and maintainability. More initial boilerplate (interfaces, adapters). Can feel over-engineered for simple CRUD. The benefit compounds as domain complexity grows.

Clean Architecture

Robert C. Martin's Clean Architecture expresses the same dependency-inversion idea as Hexagonal, using concentric circles. From innermost to outermost: Entities (enterprise business rules) → Use Cases (application business rules) → Interface Adapters (controllers, presenters, gateways) → Frameworks & Drivers (web, databases, UI). The Dependency Rule is absolute: source code dependencies must always point inward — nothing in an inner circle can know about anything in an outer circle.

                            
                            The Litmus Test: If you can replace your web framework (Express → Fastify, Django → FastAPI) or your database (PostgreSQL → MongoDB) without changing a line of business logic, you have achieved Clean Architecture. The full test suite should run without a web server or database connection.
                        

When to use: Large applications with complex business rules that must remain stable across infrastructure changes. Teams practising TDD. Systems with 5–10+ year expected lifespans.

Tradeoffs: Maximum maintainability and testability. Significant discipline required — the dependency rule is easy to violate inadvertently. Not appropriate for CRUD-heavy applications where the business rules are effectively just the data model.

Onion Architecture

Jeffrey Palermo's Onion Architecture (2008) emphasises the Repository pattern as the primary boundary between domain and infrastructure. Layers from inside out: Domain Model (entities, value objects) → Domain Services (business rules) → Application Services (orchestration, use cases) → Infrastructure (databases, external APIs). Crucially, infrastructure depends on the application core, not the other way around.

Hexagonal, Clean, and Onion Architecture are different formulations of the same fundamental idea. In production systems — especially fintech and SaaS — they are commonly combined:

DDD              # Define bounded contexts and ubiquitous language
 + Onion/Hexagonal # Isolate domain from infrastructure
 + CQRS           # Separate read/write models for performance
 + Event Sourcing  # Immutable event log for audit and compliance

CQRS — Command Query Responsibility Segregation

CQRS separates commands (write operations that change state) from queries (read operations that return data) into different models, different code paths, and often different data stores. A command validates business rules and mutates state; a query returns a denormalised view optimised for display — with no side effects.

CQRS — Separate Read and Write Models

flowchart TD
    subgraph "Write Side"
        A[Command: PlaceOrder]
        B[Command Handler]
        C[Domain Aggregate]
        D[(Write Store
Normalised DB)]
    end
    subgraph "Read Side"
        E[Query: GetOrderSummary]
        F[Query Handler]
        G[(Read Store
Denormalised View)]
    end
    H[Event Bus]
    A --> B --> C --> D
    C -->|OrderPlaced Event| H
    H -->|Project Event| G
    E --> F --> G

Why separate reads and writes? In most systems, reads vastly outnumber writes. Write models use normalised schemas for consistency; read models use denormalised views (read replicas, Elasticsearch, Redis) for query performance. You scale and optimise each side independently.

CQRS ≠ Event Sourcing: These pair naturally but are independent. CQRS is about separating read/write models. Event Sourcing is about how write-side state is stored. You can implement CQRS with a traditional relational database on the write side.

When to use: Systems with significantly asymmetric read/write workloads. Applications where the read model requires different data shapes than the write model. Systems already using Event-Driven Architecture.

Tradeoffs: Better read performance and independent scaling. Significant added complexity — two models, two code paths, potential eventual consistency between write and read stores. Overkill for simple CRUD.

Event Sourcing

Instead of storing the current state of a record, Event Sourcing stores the complete, immutable sequence of events that produced that state. Current state is derived by replaying those events. Think of it as the difference between a bank account's current balance versus the full transaction ledger — the ledger is the source of truth.

import time

# Traditional: mutate current state
# UPDATE orders SET status='shipped', updated_at=NOW() WHERE id=123

# Event Sourcing: append events, never mutate
events = [
    {"type": "OrderPlaced",    "at": "2026-01-10T09:00", "data": {"total": 150.00}},
    {"type": "PaymentReceived","at": "2026-01-10T09:02", "data": {"amount": 150.00}},
    {"type": "OrderShipped",   "at": "2026-01-11T08:00", "data": {"tracking": "UPS-9876"}},
]

# Current state = fold(initial_state, events)
def rehydrate(events):
    state = {}
    for e in events:
        if e["type"] == "OrderPlaced":     state["status"] = "placed"
        if e["type"] == "PaymentReceived": state["status"] = "paid"
        if e["type"] == "OrderShipped":    state["status"] = "shipped"
    return state

print(rehydrate(events))  # {'status': 'shipped'}

Key benefits:

Immutable audit log: Every state change is recorded — invaluable for compliance, debugging, and fraud detection
Temporal queries: "What was the state of this order at 2pm yesterday?" is trivially answered by replaying events up to that timestamp
Natural CQRS integration: Events from the write side can be projected into multiple read-optimised views simultaneously
Replay for recovery: If a downstream service fails and misses events, it can replay from the log to catch up

When to use: Financial systems, healthcare, insurance — any domain requiring a complete audit trail. Systems where CQRS read models must be built from scratch. Compliance-heavy domains.

Tradeoffs: Excellent auditability. Querying current state requires materialised views. Event schema evolution is hard — events are immutable and old handlers must still process old event shapes. Snapshots needed for performance once event counts grow large.

Distributed System Patterns

These patterns address the specific challenges that arise when multiple independent services must coordinate across network boundaries — where calls can fail, be retried, arrive out of order, or be delayed by seconds.

Saga — Distributed Transactions

A business operation often spans multiple services: placing an order requires the Order Service to create a record, the Payment Service to charge the card, and the Inventory Service to reserve items. Distributed ACID transactions are impractical at scale. The Saga pattern coordinates this as a sequence of local transactions. Each step publishes an event or command to trigger the next. If any step fails, compensating transactions undo the completed steps in reverse order.

Saga Pattern — Compensating Transactions on Failure

flowchart LR
    A[Order Created] -->|Success| B[Payment Charged]
    B -->|Success| C[Inventory Reserved]
    C -->|Success| D[Order Confirmed ✓]
    C -->|Fail| E[Compensate: Refund Payment]
    E --> F[Order Cancelled ✗]
    B -->|Fail| G[Order Cancelled ✗]
    style D fill:#3B9797,color:#fff
    style F fill:#BF092F,color:#fff
    style G fill:#BF092F,color:#fff
    style E fill:#16476A,color:#fff

Choreography vs Orchestration:

Choreography: Each service publishes events; others react. Fully decentralised — no coordinator. Harder to visualise the overall flow.
Orchestration: A central Saga Orchestrator sends commands and handles failures. Easier to reason about but introduces a coordinator as a coupling point.

When to use: Microservices where business transactions span multiple services and you need eventual consistency without distributed locks.

Tradeoffs: Enables cross-service consistency. Compensating transactions are complex to design correctly. The system is eventually consistent — partial state windows exist during execution. All operations must be idempotent (retries are guaranteed).

API Gateway & Backend for Frontend (BFF)

An API Gateway is the single entry point for all client requests in a microservices system. Clients call one endpoint; the gateway handles routing, authentication (validate JWT once, not in every service), rate limiting, request transformation, and response aggregation (combine responses from multiple services into one payload).

API Gateway — Single Entry Point for All Clients

flowchart LR
    A[Web Browser]
    B[Mobile App]
    C[Third-party]
    D[API Gateway
Auth · Rate Limit · Route]
    E[Order Service]
    F[User Service]
    G[Product Service]
    H[Payment Service]
    A --> D
    B --> D
    C --> D
    D -->|/orders| E
    D -->|/users| F
    D -->|/products| G
    D -->|/payments| H

The Backend for Frontend (BFF) pattern extends this: instead of one generic gateway, you create a dedicated backend per client type — a mobile BFF returns compressed lightweight payloads, a web BFF returns richer data shapes, an IoT BFF might use binary protocols. Each BFF can evolve independently without breaking other clients.

When to use: Any microservices system with 3+ services (API Gateway is near-mandatory). BFF when different client types have divergent data requirements and the generic gateway is growing complex transformation logic.

Tradeoffs: Simplified client experience; centralised cross-cutting concerns. Single point of failure risk (mitigated by horizontal scaling). Risk of the "smart gateway" anti-pattern — business logic migrating into the gateway.

Strangler Fig Migration

Named after a vine that grows around a host tree until it replaces it, the Strangler Fig pattern incrementally migrates a legacy monolith to a new architecture without a high-risk "big bang" rewrite. A facade (usually the API Gateway or a reverse proxy) sits in front of both systems; specific traffic routes to new services as they are built, while the monolith continues to handle unextracted features.

Migration sequence: Place reverse proxy in front of monolith → identify a bounded context to extract → build the new service in parallel → redirect that traffic through the proxy → delete the equivalent monolith code → repeat until the monolith is empty (strangled).

When to use: Legacy systems that must be modernised without downtime. The big-bang rewrite has a high failure rate; Strangler Fig enables incremental delivery of value while managing risk.

Tradeoffs: Low-risk migration. Temporary complexity of running two systems simultaneously. Some tightly-coupled monolith features are very difficult to extract cleanly. Requires discipline about which traffic routes where.

UI Architecture Patterns

UI architecture patterns structure the relationship between data (Model), presentation (View), and the layer that mediates between them. The right choice depends on your platform and how much reactive, state-driven behaviour your UI requires.

MVC, MVP, and MVVM

Pattern	Mediator Role	View–Model Relationship	Primary Use Case
MVC Model-View-Controller	Controller handles input, updates Model, selects View	View can observe Model directly	Server-rendered web frameworks (Rails, Django, Spring MVC)
MVP Model-View-Presenter	Presenter mediates all interaction; View is passive	View has no direct access to Model	Desktop/mobile apps requiring strict testability (Android, WinForms)
MVVM Model-View-ViewModel	ViewModel exposes observable state; View data-binds to it	Decoupled via data binding — no direct reference	Reactive UIs (WPF, Angular, Vue, SwiftUI, Jetpack Compose)

The evolution from MVC to MVVM mirrors the shift from server-rendered pages to client-side reactive applications. In MVC, the server controller receives a request, modifies the model, and returns a rendered page. In MVVM, the ViewModel is an observable snapshot of the View's state — when it changes, the View updates automatically through data binding without requiring a page reload or knowing how the data was fetched.

                            
                            Flux and Redux extend MVVM to unidirectional data flow for large React component trees: Action → Dispatcher → Store → View → Action. This eliminates the unpredictability of bidirectional data flow. Redux enforces a single immutable state tree, making every state change predictable and debuggable via time-travel.
                        

Resilience Patterns

In distributed systems, failure is not an exception — it is the norm. Services crash, networks partition, databases slow down. Resilience patterns define how a system degrades gracefully under failure rather than cascading to a total outage.

Circuit Breaker

The Circuit Breaker (from Michael Nygard's Release It!) prevents a failing dependency from being repeatedly called, giving it time to recover while shielding the caller from cascading failures. Three states: Closed (normal — requests pass through), Open (failure threshold exceeded — all requests fail immediately without calling downstream), Half-Open (probe — a limited number of test requests determine whether to return to Closed).

import time

class CircuitBreaker:
    CLOSED, OPEN, HALF_OPEN = 'closed', 'open', 'half_open'

    def __init__(self, failure_threshold=5, recovery_timeout=30):
        self.state = self.CLOSED
        self.failure_count = 0
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.last_failure_time = None

    def call(self, func, *args, **kwargs):
        if self.state == self.OPEN:
            if time.time() - self.last_failure_time > self.recovery_timeout:
                self.state = self.HALF_OPEN    # Probe recovery
            else:
                raise Exception("Circuit OPEN — service unavailable")
        try:
            result = func(*args, **kwargs)
            if self.state == self.HALF_OPEN:
                self.state = self.CLOSED       # Confirmed recovery
                self.failure_count = 0
            return result
        except Exception:
            self.failure_count += 1
            self.last_failure_time = time.time()
            if self.failure_count >= self.failure_threshold:
                self.state = self.OPEN         # Trip the breaker
            raise

When to use: Every inter-service call in a distributed system. In practice, service mesh solutions (Istio, Linkerd) provide circuit breaking at the infrastructure level, removing the need for application-level implementation.

Bulkhead

Named after the watertight compartments in a ship's hull that contain flooding to one section, the Bulkhead pattern partitions system resources so that exhaustion in one partition cannot affect others. If your payment service and recommendation service share a single thread pool, a slow recommendation API can starve payment processing — Bulkheads prevent this by assigning each a separate pool with a hard resource cap.

In practice: Separate thread pools per downstream dependency; separate database connection pools per service; separate Kubernetes namespaces with resource quotas; separate circuit breakers per dependency class.

                            
                            Resilience Patterns Stack Together: Circuit Breaker, Bulkhead, Retry, and Timeout are complementary layers — not alternatives. Timeout prevents a slow call from blocking indefinitely. Retry handles transient failures. Bulkhead limits resource exhaustion. Circuit Breaker stops calling a service that is clearly down. Netflix's Hystrix (now Resilience4j) and most service meshes combine all four.
                        

Architectural Quality Attributes

Architecture decisions are fundamentally about tradeoffs between quality attributes — the non-functional requirements that determine how a system behaves under various conditions.

Attribute	Definition	Measured By	Architecture Impact
Performance	Response time and throughput under load	Latency (p50, p95, p99), requests/second	Caching layers, async processing, database choice
Scalability	Ability to handle increased load	Linear vs sublinear throughput growth	Statelessness, horizontal partitioning, load balancing
Availability	System uptime and fault tolerance	Nines (99.9%, 99.99%), MTTR, MTBF	Redundancy, failover, circuit breakers, health checks
Security	Protection against threats and unauthorised access	Vulnerability count, time-to-patch, compliance	Network segmentation, auth layers, encryption at rest/transit
Maintainability	Ease of modification and evolution	Change lead time, defect rate after changes	Modularity, loose coupling, clear boundaries
Testability	Ease of verifying correctness	Test coverage achievable, test execution time	Dependency injection, interface contracts, isolation

Tradeoff Analysis

You cannot optimise all attributes simultaneously. Architecture is the art of choosing which attributes matter most for your context:

Performance vs Maintainability: Optimised code is often harder to read and change
Availability vs Consistency: The CAP theorem — distributed systems must choose (AP or CP)
Security vs Usability: More security layers create more friction for users
Scalability vs Simplicity: Distributed systems scale better but are orders of magnitude more complex

                            
                            The Architecture Tradeoff Analysis Method (ATAM): A structured technique developed at the Software Engineering Institute (CMU) for evaluating architecture decisions against quality attribute scenarios. For each proposed architecture, you identify sensitivity points (where a small change has large impact) and tradeoff points (where achieving one attribute compromises another).
                        

Architectural Decision Records (ADRs)

Architectural decisions are some of the most important choices a team makes — yet they are often buried in meeting notes, Slack threads, or (worst of all) a single person's memory. When that person leaves, the why behind the architecture is lost forever.

An Architectural Decision Record (ADR) is a short document that captures one architectural decision — the context, the decision itself, and its consequences. ADRs are stored in the repository alongside the code they govern.

ADR Template

# ADR-NNN: [Short Title of Decision]

## Status
[Proposed | Accepted | Deprecated | Superseded by ADR-XXX]

## Context
[What is the issue we are facing? What forces are at play?
What constraints exist? What options did we consider?]

## Decision
[What is the change we are proposing or have agreed to?
State it clearly and definitively.]

## Consequences
[What becomes easier or harder as a result of this decision?
What are the positive, negative, and neutral consequences?]

## Alternatives Considered
[What other options were evaluated and why were they rejected?]

Sample ADR: Choosing an Event Bus

# ADR-007: Use Apache Kafka as the Event Bus

## Status
Accepted (2026-03-15)

## Context
Our e-commerce platform needs asynchronous communication between
services (Order, Payment, Inventory, Notification). We need:
- At-least-once delivery guarantees
- Message ordering within a partition
- Support for 50,000+ events/second at peak
- Message retention for replay (consumer catch-up)
- Multi-consumer support (same event, multiple subscribers)

Options evaluated: RabbitMQ, Apache Kafka, AWS SQS/SNS, Redis Streams.

## Decision
We will use Apache Kafka (managed via Confluent Cloud) as our
primary event bus for all inter-service communication.

## Consequences
Positive:
- High throughput (100K+ events/sec demonstrated in load testing)
- Built-in partitioning for horizontal scaling
- Log-based retention allows consumer replay
- Strong ecosystem (Schema Registry, Connect, Streams)

Negative:
- Operational complexity higher than RabbitMQ
- Eventual consistency model requires idempotent consumers
- Team needs Kafka-specific training (partitions, consumer groups)
- Cost: ~$1,200/month for Confluent Cloud at expected throughput

Neutral:
- Requires schema registry for event versioning (additional component)
- Consumer offset management is our responsibility

## Alternatives Considered
- RabbitMQ: Rejected due to lack of built-in log retention and
  replay capability. Better for task queues, not event streaming.
- AWS SQS/SNS: Rejected to avoid AWS vendor lock-in (multi-cloud
  requirement from stakeholders).
- Redis Streams: Rejected due to durability concerns and limited
  ecosystem for schema management.

                            
                            Key Insight: The most valuable part of an ADR is the Context and Alternatives Considered sections. The decision itself is obvious once you understand the constraints. Future developers reading the ADR need to understand why — not just what.
                        

Pattern Comparison Table

All major patterns covered in this article, organised by category. Use this as a quick reference when choosing patterns for a new system or evaluating an existing one.

Pattern	Category	Coupling	Complexity	Scalability	Best For
Foundational / Communication
Client-Server	Communication	Medium	Low	Vertical	Web apps, APIs, mobile backends
Layered (N-Tier)	Structure	Medium-High	Low	Vertical	Enterprise CRUD apps, small–medium teams
Pipe-and-Filter	Communication	Very Low	Medium	Horizontal	Data pipelines, ETL, compilers
Event-Driven	Communication	Very Low	High	Horizontal	Reactive systems, IoT, high-throughput
Deployment / Topology
Monolith	Deployment	High	Low	Vertical	Startups, MVPs, teams <20
Modular Monolith	Deployment	Medium	Medium	Vertical	Growing teams, pre-microservices stage
Microservices	Deployment	Very Low	Very High	Horizontal	Large orgs (100+ engineers), independent scaling
Data / State Management
Primary-Replica	Data	Medium	Medium	Read-horizontal	Read-heavy workloads, high availability
CQRS	Data / Domain	Low	High	Asymmetric	Complex read/write asymmetry, analytics dashboards
Event Sourcing	Data / Domain	Low	High	Horizontal	Audit trails, compliance, fintech, SaaS
Domain Organisation
Hexagonal (Ports & Adapters)	Structure / Domain	Very Low	Medium-High	Any	Complex domains, long-lived systems, TDD
Clean Architecture	Structure / Domain	Very Low	Medium-High	Any	Framework-independent business logic
Onion Architecture	Structure / Domain	Very Low	Medium-High	Any	Domain-heavy systems, DDD applications
Distributed Coordination
Saga	Distributed	Low	High	Horizontal	Cross-service business transactions, microservices
API Gateway	Distributed	Low	Medium	Horizontal	Microservices entry point, auth, routing
Strangler Fig	Migration	Medium	Medium	Incremental	Legacy modernisation without big-bang rewrite
UI / Frontend
MVC	UI / Structure	Medium	Low	Server	Server-rendered web apps (Rails, Django, Spring)
MVP	UI / Structure	Low	Medium	Client	Desktop/mobile with strict testability requirements
MVVM	UI / Structure	Very Low	Medium	Client	Reactive UIs, SPAs, mobile (Angular, Vue, SwiftUI)
Resilience
Circuit Breaker	Resilience	Low	Low	N/A	Preventing cascading failures in distributed systems
Bulkhead	Resilience	Low	Low	N/A	Isolating resource exhaustion across dependencies

Exercises

                            
                            Exercise 1 — Pattern Selection: You are building a real-time chat application (like Slack) that needs to support 10,000 concurrent users, deliver messages within 200ms, and allow offline message retrieval. Which architectural pattern(s) would you combine, and why? Write an ADR justifying your choice.
                        

                            
                            Exercise 2 — Layered Architecture Critique: Draw a 4-layer architecture for an online bookstore. Identify which operations would be "architecture sinkholes" (passing through layers without transformation). Propose where relaxed layering would improve the design without sacrificing separation of concerns.
                        

                            
                            Exercise 3 — Event Storming: For an airline booking system, identify 10 domain events (things that happen). Draw an event-driven architecture showing which services produce and consume each event. Identify where eventual consistency could cause user-visible problems.
                        

                            
                            Exercise 4 — Monolith-to-Microservices Decision: Your team has a 3-year-old monolith with 500K lines of code and 12 developers. Deployment takes 45 minutes and fails 20% of the time. Write an ADR evaluating whether to migrate to microservices or invest in a modular monolith. Include at least 3 alternatives considered and their tradeoffs.
                        

Conclusion & Next Steps

Software architecture is not about choosing the "best" pattern — it is about choosing the right pattern for your context. A startup with three engineers choosing microservices is making a different mistake than an enterprise with five hundred engineers staying on a single monolith. Context determines correctness.

The patterns we explored span seven dimensions of architectural thinking: foundational styles (Client-Server, Layered, Event-Driven, Pipe-and-Filter), deployment topology (Monolith, Modular Monolith, Microservices), domain-centric structures (Hexagonal, Clean Architecture, Onion, CQRS, Event Sourcing), distributed coordination (Saga, API Gateway, Strangler Fig), UI organisation (MVC, MVP, MVVM), data management (Primary-Replica, CQRS), and resilience (Circuit Breaker, Bulkhead). The mnemonic to hold them together: Structure → Communication → Deployment → Data → Domain → Resilience.

Real systems combine patterns from multiple dimensions — a microservices deployment with Hexagonal structure within each service, CQRS for data access, Circuit Breakers for resilience, and an API Gateway as the entry point. The ADR is the mechanism for documenting why you reached for a particular combination. Together, the patterns and the ADR practice form the foundation for making and communicating architectural decisions that your future self — and your team — will understand.

In the next article, we go one level deeper — from system-level architecture down to module-level design, exploring the three forces that determine whether software is maintainable: modularity, coupling, and cohesion.

Next in the Series

In Part 7: Modularity, Coupling & Cohesion, we explore the three forces that determine whether your modules are maintainable, testable, and evolvable — from David Parnas's information hiding principle to measuring instability and abstractness.

Previous Part 5: Requirements & Specifications Next Part 7: Modularity, Coupling & Cohesion

Cookie Consent

Part 6: Software Architecture & Design Patterns

Table of Contents

Introduction

Why Architecture Matters for Delivery

Architecture vs Design

What Makes Something Architectural?

The Architecture Pattern Landscape

Foundational Architectural Patterns

Client-Server

Layered (N-Tier) Architecture

The Architecture Sinkhole Anti-Pattern

Pipe-and-Filter

Event-Driven Architecture

Primary-Replica (Master-Slave)

Microservices

Amazon's "Two-Pizza Teams" and Service-Oriented Architecture

Monolithic Architecture

Domain-Centric Patterns

Hexagonal Architecture (Ports & Adapters)

Clean Architecture

Onion Architecture

CQRS — Command Query Responsibility Segregation

Event Sourcing

Distributed System Patterns

Saga — Distributed Transactions

API Gateway & Backend for Frontend (BFF)

Strangler Fig Migration

UI Architecture Patterns

MVC, MVP, and MVVM

Resilience Patterns

Circuit Breaker

Bulkhead

Architectural Quality Attributes

Tradeoff Analysis

Architectural Decision Records (ADRs)

ADR Template

Sample ADR: Choosing an Event Bus

Pattern Comparison Table

Exercises

Conclusion & Next Steps

Next in the Series

Continue the Series

Part 5: Requirements Engineering & Specifications

Part 7: Modularity, Coupling & Cohesion

Part 2: Classical SDLC Models — Waterfall, V-Model, Spiral & More