Introduction
Software architecture is the skeleton of a system. Just as a building's architecture determines whether it can support ten floors or fifty, software architecture determines whether a system can serve ten users or ten million — whether a team of five can evolve it or whether it will require two hundred engineers to keep it alive.
Ralph Johnson defined architecture as "the decisions that are hard to change." Martin Fowler extended this: architecture is the shared understanding that the expert developers in a project have of the system design. Both definitions point to the same truth — architecture is about the significant decisions, the ones that constrain everything else.
In this article, we explore what software architecture actually is, how it differs from design, and then dive deep into the major architectural patterns you will encounter in real systems. We finish with a practical tool — the Architectural Decision Record — that helps teams document and communicate these critical choices.
Why Architecture Matters for Delivery
Architecture directly determines three delivery properties:
- Deployment units: How many independent pieces can be deployed separately? A monolith is one deployment unit. A microservices system may have hundreds.
- Team boundaries: Conway's Law states that organisations design systems that mirror their communication structures. Architecture defines where teams can work independently without blocking each other.
- Scalability constraints: A single-process monolith must scale vertically (bigger machines). A distributed system can scale horizontally (more machines). Architecture determines which scaling path is available.
Poor architecture choices early in a project create accidental complexity that compounds over time. Every feature takes longer, every change risks breaking something else, and eventually the system becomes so rigid that a rewrite is the only option.
Architecture vs Design
These terms are often used interchangeably, but they describe different levels of abstraction:
| Dimension | Architecture | Design |
|---|---|---|
| Scope | System-wide structure and boundaries | Within a single component or module |
| Concerns | Quality attributes, deployment topology, communication patterns | Classes, interfaces, algorithms, data structures |
| Change Cost | High — affects multiple teams, services, infrastructure | Lower — contained within a module |
| Decided By | Senior architects, tech leads, cross-team decisions | Individual developers within their component |
| Examples | "We use event-driven communication between services" | "This class uses the Strategy pattern for payment processing" |
| Documentation | Architecture Decision Records, C4 diagrams | Code itself, UML class diagrams, inline comments |
What Makes Something Architectural?
A decision is architectural if it satisfies one or more of these criteria:
- It is hard to reverse — Changing it would require significant rework across the system
- It constrains other decisions — Once chosen, it limits what designs are possible within components
- It affects multiple stakeholders — Development teams, operations, security, and business all have opinions
- It involves tradeoffs between quality attributes — You cannot optimise for everything simultaneously
The Architecture Pattern Landscape
Most introductions to software architecture present a flat list of pattern names — Client-Server, Microservices, Event-Driven — without explaining what kind of problem each one solves. This is confusing because the list mixes architectural styles, deployment models, communication patterns, and domain organisation strategies.
A more useful mental model groups patterns by the dimension of the problem they address. The table below organises the full landscape into seven categories:
| Dimension | Primary Question | Key Patterns |
|---|---|---|
| Structure / Code Organisation | How is the codebase internally organised? | Layered, Hexagonal, Clean Architecture, Onion, MVC/MVP/MVVM |
| Communication / Interaction | How do components exchange messages? | Client-Server, Event-Driven, Pipe-and-Filter, Pub/Sub, Broker, Message Bus |
| Deployment / Topology | What are the physical deployment units? | Monolith, Modular Monolith, Microservices, SOA, Serverless, Cell-Based |
| Data / State Management | How is persistent state stored and accessed? | Primary-Replica, Sharding, CQRS, Event Sourcing, Lambda Architecture |
| Domain Organisation | How are business rules bounded and expressed in code? | DDD Bounded Contexts, Hexagonal, Onion, Clean Architecture, Saga |
| Integration / Distributed Coordination | How do distributed services coordinate? | API Gateway, BFF, Saga, Strangler Fig, Service Mesh, Sidecar |
| Resilience / Fault Tolerance | How does the system behave when dependencies fail? | Circuit Breaker, Bulkhead, Retry/Timeout, Cache-Aside |
Foundational Architectural Patterns
An architectural pattern is a reusable solution to a commonly occurring problem in system structure. Patterns are not prescriptions — they are options. The skill of architecture lies in knowing which pattern fits which context. The patterns below are the foundations every architect builds on.
Client-Server
The most fundamental distributed architecture pattern. A client sends requests; a server processes them and returns responses. The entire web is built on this pattern.
flowchart LR
subgraph Clients
A[Web Browser]
B[Mobile App]
C[CLI Tool]
end
subgraph Server
D[Load Balancer]
E[Application Server]
F[Database]
end
A -->|HTTP Request| D
B -->|REST API| D
C -->|gRPC| D
D --> E
E --> F
E -->|Response| D
D -->|HTTP Response| A
Thin vs Thick Clients:
- Thin client: Minimal logic in the client. The server does most processing. Example: traditional server-rendered web apps (Rails, Django).
- Thick client: Significant logic in the client. The server provides APIs. Example: Single Page Applications (React, Angular), mobile apps.
When to use: Almost every web application, mobile backend, API service. It is the default starting point for most systems.
Tradeoffs: Simple to understand and deploy. Single point of failure at the server. Server must scale to handle all client load. Network latency affects every interaction.
Layered (N-Tier) Architecture
The layered pattern organises code into horizontal layers, each with a specific responsibility. Each layer only communicates with the layer directly below it (strict layering) or any layer below it (relaxed layering).
flowchart TD
A[Presentation Layer
UI, Controllers, Views] --> B[Business Logic Layer
Services, Domain Objects, Rules]
B --> C[Data Access Layer
Repositories, ORMs, Queries]
C --> D[Database Layer
PostgreSQL, MongoDB, Redis]
style A fill:#3B9797,color:#fff
style B fill:#16476A,color:#fff
style C fill:#132440,color:#fff
style D fill:#BF092F,color:#fff
Strict vs Relaxed Layering:
- Strict: Layer N can only call Layer N-1. Forces all requests through every layer. Maximum separation but can create "pass-through" layers that add no value.
- Relaxed: Layer N can call any layer below it. More flexible but creates hidden dependencies that make refactoring harder.
Common Layer Configurations:
- 3-tier: Presentation → Business → Data (most web applications)
- 4-tier: Presentation → Application → Domain → Infrastructure (Domain-Driven Design)
- 2-tier: Client → Database (simple desktop applications)
When to use: Business applications with clear separation of concerns. Teams that want predictable structure. Codebases where multiple developers work on different layers simultaneously.
Tradeoffs: Easy to understand and implement. Can become monolithic if all layers deploy together. Performance overhead from layer-to-layer calls. Risk of "architecture sinkhole" where layers just pass data through without transformation.
The Architecture Sinkhole Anti-Pattern
A team building a financial reporting system implemented strict 4-tier architecture. They discovered that 80% of their requests simply passed data from the database through the Data Access Layer → Business Layer → Application Layer → Presentation Layer without any transformation. The "Business Logic" layer was just calling repository.findById(id) and returning the result unchanged. The solution: bypass layers when they add no value. Allow the Presentation layer to call the Data Access layer directly for simple read operations. This is the pragmatic reality of relaxed layering.
Pipe-and-Filter
Data flows through a chain of processing stages (filters), connected by channels (pipes). Each filter is independent — it receives input, transforms it, and produces output. The Unix command line is the canonical example: cat file.log | grep ERROR | sort | uniq -c | sort -rn.
flowchart LR
A[Data Source
CSV Files] --> B[Extract
Parse & Validate]
B --> C[Transform
Clean & Enrich]
C --> D[Aggregate
Group & Sum]
D --> E[Format
JSON Output]
E --> F[Load
Data Warehouse]
style A fill:#132440,color:#fff
style B fill:#3B9797,color:#fff
style C fill:#3B9797,color:#fff
style D fill:#3B9797,color:#fff
style E fill:#3B9797,color:#fff
style F fill:#BF092F,color:#fff
Key Properties:
- Composability: Filters can be rearranged, added, or removed without changing other filters
- Reusability: A "validate email" filter can be used in multiple pipelines
- Parallelism: Independent filters can run concurrently on different data chunks
- Testability: Each filter can be tested in isolation with known input/output
When to use: Data processing pipelines (ETL), stream processing, compiler stages (lexing → parsing → semantic analysis → code generation), image processing, log analysis.
Tradeoffs: Excellent composability and testability. Not suitable for interactive applications. Overhead from serialisation/deserialisation between stages. Error handling across the pipeline is complex.
Event-Driven Architecture
Components communicate by producing and consuming events — records of something that happened. Producers do not know (or care) who consumes their events. Consumers do not know who produced the events. This creates extreme decoupling.
flowchart TD
subgraph Producers
A[Order Service]
B[Payment Service]
C[Inventory Service]
end
subgraph Event Bus
D[Message Broker
Kafka / RabbitMQ]
end
subgraph Consumers
E[Email Service]
F[Analytics Service]
G[Audit Log Service]
H[Shipping Service]
end
A -->|OrderPlaced| D
B -->|PaymentProcessed| D
C -->|StockUpdated| D
D -->|OrderPlaced| E
D -->|OrderPlaced| F
D -->|PaymentProcessed| G
D -->|OrderPlaced| H
Event Types:
- Domain Events: "OrderPlaced", "UserRegistered" — business-meaningful things that happened
- Integration Events: Events published for other services to consume across boundaries
- Event Notifications: Thin events that say "something changed" — consumers must query for details
- Event-Carried State Transfer: Fat events containing all the data consumers need — no callbacks required
Eventual Consistency: Because events are processed asynchronously, the system is eventually consistent — different services may have different views of the world for brief periods. This is the fundamental tradeoff of event-driven systems.
When to use: Systems requiring high decoupling between services. Scenarios where multiple consumers need to react to the same event. Systems where eventual consistency is acceptable. High-throughput systems (millions of events per second).
Tradeoffs: Maximum decoupling and scalability. Difficult to debug (no single call stack). Eventual consistency complicates business logic. Event schema evolution requires careful versioning.
Primary-Replica (Master-Slave)
One node (the primary) handles all write operations. Multiple replicas receive copies of the data and handle read operations. This separates read and write workloads, enabling horizontal scaling of reads.
Use cases:
- Database replication: PostgreSQL primary with read replicas for reporting queries
- Content distribution: Primary content server replicating to CDN edge nodes
- High availability: If the primary fails, a replica is promoted (failover)
Consistency Models:
- Synchronous replication: Primary waits for replicas to confirm before acknowledging writes. Strong consistency but higher latency.
- Asynchronous replication: Primary acknowledges writes immediately, replicates in the background. Lower latency but risk of data loss on primary failure.
- Semi-synchronous: Primary waits for at least one replica to confirm. Balance between consistency and performance.
When to use: Read-heavy workloads (90%+ reads), systems requiring high availability, scenarios where read scaling is more important than write scaling.
Tradeoffs: Excellent read scalability. Write bottleneck at primary. Replication lag creates potential for stale reads. Failover adds operational complexity.
Microservices
The system is decomposed into small, independently deployable services, each owning its own data and communicating via well-defined APIs or events. Each service is built, deployed, and scaled independently.
Key Properties:
- Single Responsibility: Each service does one business capability well (User Service, Payment Service, Inventory Service)
- Independent Deployment: Changing the Payment Service does not require redeploying the User Service
- Polyglot Persistence: Each service chooses the best database for its needs (SQL, NoSQL, Graph, Time-series)
- API Contracts: Services communicate through versioned APIs — internal implementation is hidden
- Fault Isolation: If the Recommendation Service crashes, the core checkout flow still works
When to use: Large organisations (100+ engineers) where team autonomy is critical. Systems requiring different scaling characteristics for different components. When deployment independence is worth the operational overhead.
Tradeoffs: Maximum team autonomy and scaling flexibility. Massive operational complexity (service mesh, distributed tracing, API gateways). Network calls replace function calls (latency). Data consistency across services is fundamentally hard.
Amazon's "Two-Pizza Teams" and Service-Oriented Architecture
In 2002, Jeff Bezos issued his famous mandate: all teams must communicate through service interfaces. No direct database access. No shared-memory models. Every team's service must be designed to be exposed externally. This forced Amazon to decompose their monolithic bookstore into hundreds of independent services — each owned by a "two-pizza team" (6-8 people). The result: Amazon could scale both their technology and their organisation. Each team could innovate independently, deploy multiple times per day, and choose their own technology stack. This became the blueprint for what we now call "microservices."
Monolithic Architecture
The entire application is a single deployment unit. All code runs in one process, shares one database, and is deployed together. Despite its reputation, the monolith is often the correct architectural choice — especially for new products, small teams, and systems where simplicity trumps flexibility.
When monoliths are correct:
- Team size is small (< 20 engineers)
- Domain boundaries are not yet clear (early product)
- Deployment simplicity is valued over independent scaling
- The system does not have drastically different scaling requirements for different components
- You cannot afford the operational overhead of distributed systems (Kubernetes, service mesh, distributed tracing)
The Modular Monolith: A pragmatic middle ground. The codebase is structured into well-defined modules with clear boundaries and interfaces — but deployed as a single unit. Each module could theoretically become a microservice, but you defer that decision until it is actually needed. This preserves simplicity while maintaining clean architecture.
# Modular Monolith Directory Structure
src/
├── modules/
│ ├── users/
│ │ ├── api/ # Public interface (what other modules can call)
│ │ ├── domain/ # Business logic (private to this module)
│ │ ├── persistence/ # Database access (private)
│ │ └── tests/
│ ├── payments/
│ │ ├── api/
│ │ ├── domain/
│ │ ├── persistence/
│ │ └── tests/
│ └── inventory/
│ ├── api/
│ ├── domain/
│ ├── persistence/
│ └── tests/
├── shared/ # Cross-cutting concerns (logging, auth)
└── main.py # Single entry point
Domain-Centric Patterns
Domain-centric patterns place the business logic at the centre of the architecture, with infrastructure (databases, frameworks, HTTP) pushed to the outer edges. This inverts the traditional layered model, where the database is the foundation. The core principle: high-level policy should not depend on low-level detail — details should depend on policies.
Hexagonal Architecture (Ports & Adapters)
Proposed by Alistair Cockburn, Hexagonal Architecture organises a system into three zones: the Application Core (pure domain logic), Ports (interfaces the core exposes or consumes), and Adapters (concrete implementations — HTTP controllers, database repositories, message queue consumers). The core knows nothing about the outside world.
flowchart LR
subgraph "Driving Side"
A[REST Controller]
B[CLI Command]
C[Test Suite]
end
subgraph "Application Core"
D[Driving Port
OrderService Interface]
E[Domain Logic
Order · Payment · Inventory]
F[Driven Port
Repository Interface]
end
subgraph "Driven Side"
G[PostgreSQL Adapter]
H[Kafka Adapter]
I[Email Adapter]
end
A -->|HTTP Adapter| D
B -->|CLI Adapter| D
C -->|Test Adapter| D
D --> E
E --> F
F --> G
F --> H
F --> I
The key rule: Dependencies always point inward. The REST controller knows the domain interface (Port) but nothing about domain internals. You can swap any adapter — replace PostgreSQL with MongoDB, REST with GraphQL — without touching business logic. This makes the core independently testable: run the full domain test suite without starting a web server or connecting to a database.
Driving vs Driven Ports:
- Driving ports (left): Interfaces the outside world calls into your application. A REST controller is an adapter for a driving port.
- Driven ports (right): Interfaces your application uses to call external systems. A repository interface is a driven port; PostgreSQL is its adapter.
When to use: Applications where business logic is complex and must be independently testable. Systems supporting multiple delivery mechanisms (REST + CLI + event consumers). Long-lived applications where infrastructure evolves.
Tradeoffs: Excellent testability and maintainability. More initial boilerplate (interfaces, adapters). Can feel over-engineered for simple CRUD. The benefit compounds as domain complexity grows.
Clean Architecture
Robert C. Martin's Clean Architecture expresses the same dependency-inversion idea as Hexagonal, using concentric circles. From innermost to outermost: Entities (enterprise business rules) → Use Cases (application business rules) → Interface Adapters (controllers, presenters, gateways) → Frameworks & Drivers (web, databases, UI). The Dependency Rule is absolute: source code dependencies must always point inward — nothing in an inner circle can know about anything in an outer circle.
When to use: Large applications with complex business rules that must remain stable across infrastructure changes. Teams practising TDD. Systems with 5–10+ year expected lifespans.
Tradeoffs: Maximum maintainability and testability. Significant discipline required — the dependency rule is easy to violate inadvertently. Not appropriate for CRUD-heavy applications where the business rules are effectively just the data model.
Onion Architecture
Jeffrey Palermo's Onion Architecture (2008) emphasises the Repository pattern as the primary boundary between domain and infrastructure. Layers from inside out: Domain Model (entities, value objects) → Domain Services (business rules) → Application Services (orchestration, use cases) → Infrastructure (databases, external APIs). Crucially, infrastructure depends on the application core, not the other way around.
Hexagonal, Clean, and Onion Architecture are different formulations of the same fundamental idea. In production systems — especially fintech and SaaS — they are commonly combined:
DDD # Define bounded contexts and ubiquitous language
+ Onion/Hexagonal # Isolate domain from infrastructure
+ CQRS # Separate read/write models for performance
+ Event Sourcing # Immutable event log for audit and compliance
CQRS — Command Query Responsibility Segregation
CQRS separates commands (write operations that change state) from queries (read operations that return data) into different models, different code paths, and often different data stores. A command validates business rules and mutates state; a query returns a denormalised view optimised for display — with no side effects.
flowchart TD
subgraph "Write Side"
A[Command: PlaceOrder]
B[Command Handler]
C[Domain Aggregate]
D[(Write Store
Normalised DB)]
end
subgraph "Read Side"
E[Query: GetOrderSummary]
F[Query Handler]
G[(Read Store
Denormalised View)]
end
H[Event Bus]
A --> B --> C --> D
C -->|OrderPlaced Event| H
H -->|Project Event| G
E --> F --> G
Why separate reads and writes? In most systems, reads vastly outnumber writes. Write models use normalised schemas for consistency; read models use denormalised views (read replicas, Elasticsearch, Redis) for query performance. You scale and optimise each side independently.
CQRS ≠ Event Sourcing: These pair naturally but are independent. CQRS is about separating read/write models. Event Sourcing is about how write-side state is stored. You can implement CQRS with a traditional relational database on the write side.
When to use: Systems with significantly asymmetric read/write workloads. Applications where the read model requires different data shapes than the write model. Systems already using Event-Driven Architecture.
Tradeoffs: Better read performance and independent scaling. Significant added complexity — two models, two code paths, potential eventual consistency between write and read stores. Overkill for simple CRUD.
Event Sourcing
Instead of storing the current state of a record, Event Sourcing stores the complete, immutable sequence of events that produced that state. Current state is derived by replaying those events. Think of it as the difference between a bank account's current balance versus the full transaction ledger — the ledger is the source of truth.
import time
# Traditional: mutate current state
# UPDATE orders SET status='shipped', updated_at=NOW() WHERE id=123
# Event Sourcing: append events, never mutate
events = [
{"type": "OrderPlaced", "at": "2026-01-10T09:00", "data": {"total": 150.00}},
{"type": "PaymentReceived","at": "2026-01-10T09:02", "data": {"amount": 150.00}},
{"type": "OrderShipped", "at": "2026-01-11T08:00", "data": {"tracking": "UPS-9876"}},
]
# Current state = fold(initial_state, events)
def rehydrate(events):
state = {}
for e in events:
if e["type"] == "OrderPlaced": state["status"] = "placed"
if e["type"] == "PaymentReceived": state["status"] = "paid"
if e["type"] == "OrderShipped": state["status"] = "shipped"
return state
print(rehydrate(events)) # {'status': 'shipped'}
Key benefits:
- Immutable audit log: Every state change is recorded — invaluable for compliance, debugging, and fraud detection
- Temporal queries: "What was the state of this order at 2pm yesterday?" is trivially answered by replaying events up to that timestamp
- Natural CQRS integration: Events from the write side can be projected into multiple read-optimised views simultaneously
- Replay for recovery: If a downstream service fails and misses events, it can replay from the log to catch up
When to use: Financial systems, healthcare, insurance — any domain requiring a complete audit trail. Systems where CQRS read models must be built from scratch. Compliance-heavy domains.
Tradeoffs: Excellent auditability. Querying current state requires materialised views. Event schema evolution is hard — events are immutable and old handlers must still process old event shapes. Snapshots needed for performance once event counts grow large.
Distributed System Patterns
These patterns address the specific challenges that arise when multiple independent services must coordinate across network boundaries — where calls can fail, be retried, arrive out of order, or be delayed by seconds.
Saga — Distributed Transactions
A business operation often spans multiple services: placing an order requires the Order Service to create a record, the Payment Service to charge the card, and the Inventory Service to reserve items. Distributed ACID transactions are impractical at scale. The Saga pattern coordinates this as a sequence of local transactions. Each step publishes an event or command to trigger the next. If any step fails, compensating transactions undo the completed steps in reverse order.
flowchart LR
A[Order Created] -->|Success| B[Payment Charged]
B -->|Success| C[Inventory Reserved]
C -->|Success| D[Order Confirmed ✓]
C -->|Fail| E[Compensate: Refund Payment]
E --> F[Order Cancelled ✗]
B -->|Fail| G[Order Cancelled ✗]
style D fill:#3B9797,color:#fff
style F fill:#BF092F,color:#fff
style G fill:#BF092F,color:#fff
style E fill:#16476A,color:#fff
Choreography vs Orchestration:
- Choreography: Each service publishes events; others react. Fully decentralised — no coordinator. Harder to visualise the overall flow.
- Orchestration: A central Saga Orchestrator sends commands and handles failures. Easier to reason about but introduces a coordinator as a coupling point.
When to use: Microservices where business transactions span multiple services and you need eventual consistency without distributed locks.
Tradeoffs: Enables cross-service consistency. Compensating transactions are complex to design correctly. The system is eventually consistent — partial state windows exist during execution. All operations must be idempotent (retries are guaranteed).
API Gateway & Backend for Frontend (BFF)
An API Gateway is the single entry point for all client requests in a microservices system. Clients call one endpoint; the gateway handles routing, authentication (validate JWT once, not in every service), rate limiting, request transformation, and response aggregation (combine responses from multiple services into one payload).
flowchart LR
A[Web Browser]
B[Mobile App]
C[Third-party]
D[API Gateway
Auth · Rate Limit · Route]
E[Order Service]
F[User Service]
G[Product Service]
H[Payment Service]
A --> D
B --> D
C --> D
D -->|/orders| E
D -->|/users| F
D -->|/products| G
D -->|/payments| H
The Backend for Frontend (BFF) pattern extends this: instead of one generic gateway, you create a dedicated backend per client type — a mobile BFF returns compressed lightweight payloads, a web BFF returns richer data shapes, an IoT BFF might use binary protocols. Each BFF can evolve independently without breaking other clients.
When to use: Any microservices system with 3+ services (API Gateway is near-mandatory). BFF when different client types have divergent data requirements and the generic gateway is growing complex transformation logic.
Tradeoffs: Simplified client experience; centralised cross-cutting concerns. Single point of failure risk (mitigated by horizontal scaling). Risk of the "smart gateway" anti-pattern — business logic migrating into the gateway.
Strangler Fig Migration
Named after a vine that grows around a host tree until it replaces it, the Strangler Fig pattern incrementally migrates a legacy monolith to a new architecture without a high-risk "big bang" rewrite. A facade (usually the API Gateway or a reverse proxy) sits in front of both systems; specific traffic routes to new services as they are built, while the monolith continues to handle unextracted features.
Migration sequence: Place reverse proxy in front of monolith → identify a bounded context to extract → build the new service in parallel → redirect that traffic through the proxy → delete the equivalent monolith code → repeat until the monolith is empty (strangled).
When to use: Legacy systems that must be modernised without downtime. The big-bang rewrite has a high failure rate; Strangler Fig enables incremental delivery of value while managing risk.
Tradeoffs: Low-risk migration. Temporary complexity of running two systems simultaneously. Some tightly-coupled monolith features are very difficult to extract cleanly. Requires discipline about which traffic routes where.
UI Architecture Patterns
UI architecture patterns structure the relationship between data (Model), presentation (View), and the layer that mediates between them. The right choice depends on your platform and how much reactive, state-driven behaviour your UI requires.
MVC, MVP, and MVVM
| Pattern | Mediator Role | View–Model Relationship | Primary Use Case |
|---|---|---|---|
| MVC Model-View-Controller |
Controller handles input, updates Model, selects View | View can observe Model directly | Server-rendered web frameworks (Rails, Django, Spring MVC) |
| MVP Model-View-Presenter |
Presenter mediates all interaction; View is passive | View has no direct access to Model | Desktop/mobile apps requiring strict testability (Android, WinForms) |
| MVVM Model-View-ViewModel |
ViewModel exposes observable state; View data-binds to it | Decoupled via data binding — no direct reference | Reactive UIs (WPF, Angular, Vue, SwiftUI, Jetpack Compose) |
The evolution from MVC to MVVM mirrors the shift from server-rendered pages to client-side reactive applications. In MVC, the server controller receives a request, modifies the model, and returns a rendered page. In MVVM, the ViewModel is an observable snapshot of the View's state — when it changes, the View updates automatically through data binding without requiring a page reload or knowing how the data was fetched.
Resilience Patterns
In distributed systems, failure is not an exception — it is the norm. Services crash, networks partition, databases slow down. Resilience patterns define how a system degrades gracefully under failure rather than cascading to a total outage.
Circuit Breaker
The Circuit Breaker (from Michael Nygard's Release It!) prevents a failing dependency from being repeatedly called, giving it time to recover while shielding the caller from cascading failures. Three states: Closed (normal — requests pass through), Open (failure threshold exceeded — all requests fail immediately without calling downstream), Half-Open (probe — a limited number of test requests determine whether to return to Closed).
import time
class CircuitBreaker:
CLOSED, OPEN, HALF_OPEN = 'closed', 'open', 'half_open'
def __init__(self, failure_threshold=5, recovery_timeout=30):
self.state = self.CLOSED
self.failure_count = 0
self.failure_threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self.last_failure_time = None
def call(self, func, *args, **kwargs):
if self.state == self.OPEN:
if time.time() - self.last_failure_time > self.recovery_timeout:
self.state = self.HALF_OPEN # Probe recovery
else:
raise Exception("Circuit OPEN — service unavailable")
try:
result = func(*args, **kwargs)
if self.state == self.HALF_OPEN:
self.state = self.CLOSED # Confirmed recovery
self.failure_count = 0
return result
except Exception:
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = self.OPEN # Trip the breaker
raise
When to use: Every inter-service call in a distributed system. In practice, service mesh solutions (Istio, Linkerd) provide circuit breaking at the infrastructure level, removing the need for application-level implementation.
Bulkhead
Named after the watertight compartments in a ship's hull that contain flooding to one section, the Bulkhead pattern partitions system resources so that exhaustion in one partition cannot affect others. If your payment service and recommendation service share a single thread pool, a slow recommendation API can starve payment processing — Bulkheads prevent this by assigning each a separate pool with a hard resource cap.
In practice: Separate thread pools per downstream dependency; separate database connection pools per service; separate Kubernetes namespaces with resource quotas; separate circuit breakers per dependency class.
Architectural Quality Attributes
Architecture decisions are fundamentally about tradeoffs between quality attributes — the non-functional requirements that determine how a system behaves under various conditions.
| Attribute | Definition | Measured By | Architecture Impact |
|---|---|---|---|
| Performance | Response time and throughput under load | Latency (p50, p95, p99), requests/second | Caching layers, async processing, database choice |
| Scalability | Ability to handle increased load | Linear vs sublinear throughput growth | Statelessness, horizontal partitioning, load balancing |
| Availability | System uptime and fault tolerance | Nines (99.9%, 99.99%), MTTR, MTBF | Redundancy, failover, circuit breakers, health checks |
| Security | Protection against threats and unauthorised access | Vulnerability count, time-to-patch, compliance | Network segmentation, auth layers, encryption at rest/transit |
| Maintainability | Ease of modification and evolution | Change lead time, defect rate after changes | Modularity, loose coupling, clear boundaries |
| Testability | Ease of verifying correctness | Test coverage achievable, test execution time | Dependency injection, interface contracts, isolation |
Tradeoff Analysis
You cannot optimise all attributes simultaneously. Architecture is the art of choosing which attributes matter most for your context:
- Performance vs Maintainability: Optimised code is often harder to read and change
- Availability vs Consistency: The CAP theorem — distributed systems must choose (AP or CP)
- Security vs Usability: More security layers create more friction for users
- Scalability vs Simplicity: Distributed systems scale better but are orders of magnitude more complex
Architectural Decision Records (ADRs)
Architectural decisions are some of the most important choices a team makes — yet they are often buried in meeting notes, Slack threads, or (worst of all) a single person's memory. When that person leaves, the why behind the architecture is lost forever.
An Architectural Decision Record (ADR) is a short document that captures one architectural decision — the context, the decision itself, and its consequences. ADRs are stored in the repository alongside the code they govern.
ADR Template
# ADR-NNN: [Short Title of Decision]
## Status
[Proposed | Accepted | Deprecated | Superseded by ADR-XXX]
## Context
[What is the issue we are facing? What forces are at play?
What constraints exist? What options did we consider?]
## Decision
[What is the change we are proposing or have agreed to?
State it clearly and definitively.]
## Consequences
[What becomes easier or harder as a result of this decision?
What are the positive, negative, and neutral consequences?]
## Alternatives Considered
[What other options were evaluated and why were they rejected?]
Sample ADR: Choosing an Event Bus
# ADR-007: Use Apache Kafka as the Event Bus
## Status
Accepted (2026-03-15)
## Context
Our e-commerce platform needs asynchronous communication between
services (Order, Payment, Inventory, Notification). We need:
- At-least-once delivery guarantees
- Message ordering within a partition
- Support for 50,000+ events/second at peak
- Message retention for replay (consumer catch-up)
- Multi-consumer support (same event, multiple subscribers)
Options evaluated: RabbitMQ, Apache Kafka, AWS SQS/SNS, Redis Streams.
## Decision
We will use Apache Kafka (managed via Confluent Cloud) as our
primary event bus for all inter-service communication.
## Consequences
Positive:
- High throughput (100K+ events/sec demonstrated in load testing)
- Built-in partitioning for horizontal scaling
- Log-based retention allows consumer replay
- Strong ecosystem (Schema Registry, Connect, Streams)
Negative:
- Operational complexity higher than RabbitMQ
- Eventual consistency model requires idempotent consumers
- Team needs Kafka-specific training (partitions, consumer groups)
- Cost: ~$1,200/month for Confluent Cloud at expected throughput
Neutral:
- Requires schema registry for event versioning (additional component)
- Consumer offset management is our responsibility
## Alternatives Considered
- RabbitMQ: Rejected due to lack of built-in log retention and
replay capability. Better for task queues, not event streaming.
- AWS SQS/SNS: Rejected to avoid AWS vendor lock-in (multi-cloud
requirement from stakeholders).
- Redis Streams: Rejected due to durability concerns and limited
ecosystem for schema management.
Pattern Comparison Table
All major patterns covered in this article, organised by category. Use this as a quick reference when choosing patterns for a new system or evaluating an existing one.
| Pattern | Category | Coupling | Complexity | Scalability | Best For |
|---|---|---|---|---|---|
| Foundational / Communication | |||||
| Client-Server | Communication | Medium | Low | Vertical | Web apps, APIs, mobile backends |
| Layered (N-Tier) | Structure | Medium-High | Low | Vertical | Enterprise CRUD apps, small–medium teams |
| Pipe-and-Filter | Communication | Very Low | Medium | Horizontal | Data pipelines, ETL, compilers |
| Event-Driven | Communication | Very Low | High | Horizontal | Reactive systems, IoT, high-throughput |
| Deployment / Topology | |||||
| Monolith | Deployment | High | Low | Vertical | Startups, MVPs, teams <20 |
| Modular Monolith | Deployment | Medium | Medium | Vertical | Growing teams, pre-microservices stage |
| Microservices | Deployment | Very Low | Very High | Horizontal | Large orgs (100+ engineers), independent scaling |
| Data / State Management | |||||
| Primary-Replica | Data | Medium | Medium | Read-horizontal | Read-heavy workloads, high availability |
| CQRS | Data / Domain | Low | High | Asymmetric | Complex read/write asymmetry, analytics dashboards |
| Event Sourcing | Data / Domain | Low | High | Horizontal | Audit trails, compliance, fintech, SaaS |
| Domain Organisation | |||||
| Hexagonal (Ports & Adapters) | Structure / Domain | Very Low | Medium-High | Any | Complex domains, long-lived systems, TDD |
| Clean Architecture | Structure / Domain | Very Low | Medium-High | Any | Framework-independent business logic |
| Onion Architecture | Structure / Domain | Very Low | Medium-High | Any | Domain-heavy systems, DDD applications |
| Distributed Coordination | |||||
| Saga | Distributed | Low | High | Horizontal | Cross-service business transactions, microservices |
| API Gateway | Distributed | Low | Medium | Horizontal | Microservices entry point, auth, routing |
| Strangler Fig | Migration | Medium | Medium | Incremental | Legacy modernisation without big-bang rewrite |
| UI / Frontend | |||||
| MVC | UI / Structure | Medium | Low | Server | Server-rendered web apps (Rails, Django, Spring) |
| MVP | UI / Structure | Low | Medium | Client | Desktop/mobile with strict testability requirements |
| MVVM | UI / Structure | Very Low | Medium | Client | Reactive UIs, SPAs, mobile (Angular, Vue, SwiftUI) |
| Resilience | |||||
| Circuit Breaker | Resilience | Low | Low | N/A | Preventing cascading failures in distributed systems |
| Bulkhead | Resilience | Low | Low | N/A | Isolating resource exhaustion across dependencies |
Exercises
Conclusion & Next Steps
Software architecture is not about choosing the "best" pattern — it is about choosing the right pattern for your context. A startup with three engineers choosing microservices is making a different mistake than an enterprise with five hundred engineers staying on a single monolith. Context determines correctness.
The patterns we explored span seven dimensions of architectural thinking: foundational styles (Client-Server, Layered, Event-Driven, Pipe-and-Filter), deployment topology (Monolith, Modular Monolith, Microservices), domain-centric structures (Hexagonal, Clean Architecture, Onion, CQRS, Event Sourcing), distributed coordination (Saga, API Gateway, Strangler Fig), UI organisation (MVC, MVP, MVVM), data management (Primary-Replica, CQRS), and resilience (Circuit Breaker, Bulkhead). The mnemonic to hold them together: Structure → Communication → Deployment → Data → Domain → Resilience.
Real systems combine patterns from multiple dimensions — a microservices deployment with Hexagonal structure within each service, CQRS for data access, Circuit Breakers for resilience, and an API Gateway as the entry point. The ADR is the mechanism for documenting why you reached for a particular combination. Together, the patterns and the ADR practice form the foundation for making and communicating architectural decisions that your future self — and your team — will understand.
In the next article, we go one level deeper — from system-level architecture down to module-level design, exploring the three forces that determine whether software is maintainable: modularity, coupling, and cohesion.
Next in the Series
In Part 7: Modularity, Coupling & Cohesion, we explore the three forces that determine whether your modules are maintainable, testable, and evolvable — from David Parnas's information hiding principle to measuring instability and abstractness.