Back to Technology

System Design Series Part 5: Microservices Architecture

January 25, 2026 Wasil Zafar 45 min read

Master microservices architecture patterns for building scalable, maintainable systems. Learn service decomposition, API gateways, service mesh, containerization, and Kubernetes orchestration.

Table of Contents

  1. Microservices Fundamentals
  2. Service Decomposition
  3. Architecture Patterns
  4. Serverless Architecture
  5. Next Steps

Microservices Fundamentals

Series Navigation: This is Part 5 of the 15-part System Design Series. Review Part 4: Database Design & Sharding first.

Microservices architecture structures an application as a collection of loosely coupled, independently deployable services. Each service is owned by a small team and focuses on a specific business capability.

Key Insight: Microservices aren't always the answer. Start with a monolith and extract services as complexity grows and team boundaries become clear.

Core Principles

  • Single Responsibility: Each service does one thing well
  • Loose Coupling: Services interact through well-defined interfaces
  • High Cohesion: Related functionality grouped together
  • Independent Deployment: Deploy services without affecting others
  • Decentralized Data: Each service owns its data store
  • Failure Isolation: One service failure doesn't crash the system

Monolith vs Microservices

Understanding when to use each architecture is crucial:

Aspect Monolith Microservices
Deployment Deploy entire application Deploy services independently
Scaling Scale entire application Scale individual services
Technology Single tech stack Polyglot (different stacks per service)
Team Structure Large, cross-functional Small, service-owning teams
Complexity In-process calls Network calls, distributed systems
Data Shared database Database per service
Testing End-to-end easier Integration testing complex

When to Choose Monolith

  • Small team (< 10 developers)
  • Early-stage startup exploring product-market fit
  • Simple domain with clear boundaries
  • Need rapid development without distributed complexity
  • Limited DevOps expertise
# Monolith structure example
my_app/
+-- app/
¦   +-- models/
¦   ¦   +-- user.py
¦   ¦   +-- order.py
¦   ¦   +-- product.py
¦   +-- services/
¦   ¦   +-- user_service.py
¦   ¦   +-- order_service.py
¦   ¦   +-- product_service.py
¦   +-- api/
¦   ¦   +-- routes.py
¦   +-- database.py
+-- tests/
+-- requirements.txt
Simple Fast to Start

When to Choose Microservices

  • Large organization with multiple teams
  • Different parts of system have different scaling needs
  • Teams need to deploy independently
  • Complex domain with clear bounded contexts
  • Strong DevOps/infrastructure capabilities
# Microservices structure example
platform/
+-- services/
¦   +-- user-service/
¦   ¦   +-- src/
¦   ¦   +-- Dockerfile
¦   ¦   +-- k8s/
¦   +-- order-service/
¦   ¦   +-- src/
¦   ¦   +-- Dockerfile
¦   ¦   +-- k8s/
¦   +-- payment-service/
¦   ¦   +-- src/
¦   ¦   +-- Dockerfile
¦   ¦   +-- k8s/
¦   +-- notification-service/
¦       +-- src/
¦       +-- Dockerfile
¦       +-- k8s/
+-- api-gateway/
+-- infrastructure/
+-- docker-compose.yml
Scalable Team Autonomy
The Monolith First Approach: Many successful companies (Shopify, Etsy) started as monoliths and extracted services as they scaled. Don't prematurely optimize for microservices—the complexity isn't free.

Service Decomposition

Breaking a monolith into microservices requires thoughtful decomposition strategies:

Domain-Driven Design (DDD)

Use bounded contexts from DDD to identify service boundaries:

E-Commerce Bounded Contexts

# Bounded Contexts for E-Commerce Platform

# User Context - User identity and preferences
class UserService:
    """Handles user registration, authentication, profiles"""
    def register_user(self, email, password): pass
    def authenticate(self, email, password): pass
    def get_profile(self, user_id): pass

# Catalog Context - Product information
class CatalogService:
    """Manages product listings, categories, search"""
    def get_product(self, product_id): pass
    def search_products(self, query): pass
    def get_category(self, category_id): pass

# Order Context - Order lifecycle
class OrderService:
    """Handles order creation, status, history"""
    def create_order(self, user_id, items): pass
    def get_order_status(self, order_id): pass
    def cancel_order(self, order_id): pass

# Inventory Context - Stock management
class InventoryService:
    """Manages stock levels, reservations"""
    def check_availability(self, product_id, quantity): pass
    def reserve_stock(self, product_id, quantity): pass
    def release_stock(self, product_id, quantity): pass

# Payment Context - Financial transactions
class PaymentService:
    """Processes payments, refunds"""
    def process_payment(self, order_id, amount): pass
    def refund_payment(self, payment_id): pass

# Shipping Context - Fulfillment
class ShippingService:
    """Handles shipping labels, tracking"""
    def create_shipment(self, order_id): pass
    def get_tracking(self, shipment_id): pass
DDD Bounded Context

Decomposition Strategies

Strategy Description When to Use
By Business Capability Align services with business functions Clear business domains exist
By Subdomain Core, supporting, generic subdomains Complex domain with varying importance
Strangler Fig Gradually replace monolith pieces Migrating existing monolith
By Team Conway's Law—match org structure Clear team boundaries

Service Boundaries

Good service boundaries minimize inter-service communication:

Good Boundaries

# Good: Order service owns all order data
class OrderService:
    def create_order(self, user_id, items):
        # All order logic contained within service
        order = Order(user_id=user_id)
        for item in items:
            order.add_item(item)
        order.calculate_total()
        self.db.save(order)
        
        # Emit event for other services (async)
        self.event_bus.publish("order.created", order)
        return order
    
    def get_order_details(self, order_id):
        # No external calls needed
        return self.db.get(order_id)

Bad Boundaries (Distributed Monolith)

# Bad: Order service makes synchronous calls for every operation
class OrderService:
    def create_order(self, user_id, items):
        # Synchronous calls create tight coupling
        user = self.user_service.get_user(user_id)  # Network call
        
        for item in items:
            product = self.catalog_service.get_product(item.id)  # Network call
            price = self.pricing_service.get_price(item.id)  # Network call
            stock = self.inventory_service.check(item.id)  # Network call
        
        # Any service failure = order failure
        # Can't deploy independently
        # Latency compounds with each call

Signs of Bad Boundaries

  • Services need to call each other synchronously for simple operations
  • Circular dependencies between services
  • Shared database between services
  • Must deploy multiple services together
  • Same data modified by multiple services

Architecture Patterns

API Gateway

Single entry point that routes requests to appropriate services.

# API Gateway responsibilities
class APIGateway:
    def __init__(self):
        self.services = {
            "/users": "user-service:8080",
            "/orders": "order-service:8080",
            "/products": "catalog-service:8080"
        }
    
    def route_request(self, request):
        # 1. Authentication
        user = self.authenticate(request.headers.get("Authorization"))
        
        # 2. Rate limiting
        if self.rate_limiter.is_limited(user.id):
            return Response(status=429)
        
        # 3. Route to service
        service = self.get_service(request.path)
        
        # 4. Protocol translation (REST -> gRPC)
        response = service.forward(request)
        
        # 5. Response aggregation (if needed)
        return response
    
    def aggregate_product_page(self, product_id):
        """Combine data from multiple services"""
        product = self.catalog_service.get(product_id)
        reviews = self.review_service.get_for_product(product_id)
        inventory = self.inventory_service.check(product_id)
        
        return {
            **product,
            "reviews": reviews,
            "in_stock": inventory.available > 0
        }
Entry Point Cross-Cutting Concerns

Backend for Frontend (BFF)

Separate backend for each client type (web, mobile, IoT).

# BFF Pattern
# Each client gets optimized API

# Mobile BFF - Minimal data, pagination
class MobileBFF:
    def get_product_list(self, page=1, limit=20):
        products = self.catalog.get_products(page, limit)
        return [{
            "id": p.id,
            "name": p.name,
            "price": p.price,
            "thumbnail": p.images[0].url  # Only first image
        } for p in products]

# Web BFF - Rich data, full details
class WebBFF:
    def get_product_list(self):
        products = self.catalog.get_products()
        for product in products:
            product["reviews_summary"] = self.reviews.get_summary(product.id)
            product["availability"] = self.inventory.check(product.id)
            product["related"] = self.recommendations.get_related(product.id)
        return products
Client-Specific Optimized Responses

Database per Service

Each service owns its data store—no shared databases.

# Each service manages its own database
# Order Service - PostgreSQL for transactions
order_db = PostgreSQL("order-db")

# Catalog Service - MongoDB for flexible product data
catalog_db = MongoDB("catalog-db")

# Search Service - Elasticsearch for full-text search
search_db = Elasticsearch("search-cluster")

# Session Service - Redis for fast access
session_db = Redis("session-cache")

# Analytics Service - ClickHouse for time-series
analytics_db = ClickHouse("analytics-cluster")

Challenge: Cross-service queries require careful design (event sourcing, CQRS, or API composition).

Data Isolation Polyglot Persistence

Saga Pattern

Manage distributed transactions across services using compensating transactions.

# Saga Pattern for Order Creation
class OrderSaga:
    def execute(self, order_data):
        saga_log = []
        
        try:
            # Step 1: Reserve inventory
            reservation = self.inventory.reserve(order_data.items)
            saga_log.append(("inventory", reservation.id))
            
            # Step 2: Process payment
            payment = self.payment.charge(order_data.user_id, order_data.total)
            saga_log.append(("payment", payment.id))
            
            # Step 3: Create order
            order = self.orders.create(order_data)
            saga_log.append(("order", order.id))
            
            # Step 4: Notify user
            self.notifications.send(order_data.user_id, "Order confirmed!")
            
            return order
            
        except Exception as e:
            # Compensating transactions (rollback)
            self.compensate(saga_log)
            raise e
    
    def compensate(self, saga_log):
        """Undo completed steps in reverse order"""
        for service, resource_id in reversed(saga_log):
            if service == "order":
                self.orders.cancel(resource_id)
            elif service == "payment":
                self.payment.refund(resource_id)
            elif service == "inventory":
                self.inventory.release(resource_id)
Distributed Transactions Eventual Consistency

Circuit Breaker

Prevent cascading failures by stopping calls to failing services.

# Circuit Breaker Implementation
from enum import Enum
import time

class CircuitState(Enum):
    CLOSED = "closed"      # Normal operation
    OPEN = "open"          # Failing, reject calls
    HALF_OPEN = "half_open"  # Testing recovery

class CircuitBreaker:
    def __init__(self, failure_threshold=5, recovery_timeout=30):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.failure_count = 0
        self.last_failure_time = None
        self.state = CircuitState.CLOSED
    
    def call(self, func, *args, **kwargs):
        if self.state == CircuitState.OPEN:
            if self._should_attempt_reset():
                self.state = CircuitState.HALF_OPEN
            else:
                raise CircuitOpenError("Circuit is open")
        
        try:
            result = func(*args, **kwargs)
            self._on_success()
            return result
        except Exception as e:
            self._on_failure()
            raise e
    
    def _on_success(self):
        self.failure_count = 0
        self.state = CircuitState.CLOSED
    
    def _on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        
        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN
    
    def _should_attempt_reset(self):
        return (time.time() - self.last_failure_time) > self.recovery_timeout

# Usage
payment_circuit = CircuitBreaker(failure_threshold=5, recovery_timeout=30)
result = payment_circuit.call(payment_service.charge, user_id, amount)
Fault Tolerance Resilience

Service Mesh

A service mesh is a dedicated infrastructure layer for handling service-to-service communication. It offloads common concerns from application code.

Service Mesh Capabilities

  • Traffic Management: Load balancing, routing, retries
  • Security: mTLS encryption, authentication
  • Observability: Metrics, tracing, logging
  • Resilience: Circuit breaking, timeouts, rate limiting

Istio Service Mesh Example

# Istio VirtualService for traffic routing
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: order-service
spec:
  hosts:
    - order-service
  http:
    - match:
        - headers:
            x-canary:
              exact: "true"
      route:
        - destination:
            host: order-service
            subset: v2
          weight: 100
    - route:
        - destination:
            host: order-service
            subset: v1
          weight: 90
        - destination:
            host: order-service
            subset: v2
          weight: 10  # 10% canary traffic

---
# Istio DestinationRule for circuit breaking
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: order-service
spec:
  host: order-service
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        h2UpgradePolicy: UPGRADE
        http1MaxPendingRequests: 100
    outlierDetection:
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 30s
Istio Traffic Management

Popular Service Meshes

Service Mesh Architecture Best For
Istio Envoy sidecar Full-featured, large deployments
Linkerd Lightweight proxy Simplicity, performance
Consul Connect HashiCorp ecosystem Multi-cloud, service discovery
AWS App Mesh Envoy, AWS integration AWS-native workloads

Serverless Architecture

Key Insight: Serverless doesn't mean "no servers"—it means you don't manage servers. The cloud provider handles scaling, patching, and infrastructure automatically.

Function-as-a-Service (FaaS) is a cloud computing model where you deploy individual functions that execute in response to events. AWS Lambda, Azure Functions, and Google Cloud Functions are popular FaaS platforms.

Serverless Benefits

  • No Server Management: Focus on code, not infrastructure
  • Auto-Scaling: Scale to zero or thousands instantly
  • Pay-per-Use: Only pay for actual execution time
  • Built-in HA: Multi-AZ by default

AWS Lambda Function Example

# AWS Lambda Function - Process uploaded images
import boto3
import json
from PIL import Image
import io

s3 = boto3.client('s3')

def lambda_handler(event, context):
    """Triggered when image uploaded to S3"""
    # Get uploaded file info
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key']
    
    # Download image
    response = s3.get_object(Bucket=bucket, Key=key)
    image_content = response['Body'].read()
    
    # Process image (resize)
    image = Image.open(io.BytesIO(image_content))
    thumbnail = image.resize((200, 200))
    
    # Save thumbnail
    buffer = io.BytesIO()
    thumbnail.save(buffer, format='JPEG')
    buffer.seek(0)
    
    thumbnail_key = f"thumbnails/{key}"
    s3.put_object(
        Bucket=bucket,
        Key=thumbnail_key,
        Body=buffer.getvalue()
    )
    
    return {
        'statusCode': 200,
        'body': json.dumps({'thumbnail': thumbnail_key})
    }
AWS Lambda Event-Driven

Serverless Patterns

Event-Driven Processing

Functions triggered by events from various sources:

# serverless.yml - Event triggers
service: order-processor

functions:
  processOrder:
    handler: handler.process_order
    events:
      - sqs:
          arn: arn:aws:sqs:region:account:order-queue
          batchSize: 10
  
  sendNotification:
    handler: handler.send_notification
    events:
      - sns:
          arn: arn:aws:sns:region:account:order-notifications
  
  apiEndpoint:
    handler: handler.api
    events:
      - http:
          path: /orders
          method: post
  
  scheduledReport:
    handler: handler.generate_report
    events:
      - schedule: rate(1 day)

Function Composition

Chain functions using Step Functions or event choreography:

# AWS Step Functions - Order workflow
{
  "Comment": "Order Processing Workflow",
  "StartAt": "ValidateOrder",
  "States": {
    "ValidateOrder": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:validateOrder",
      "Next": "CheckInventory"
    },
    "CheckInventory": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:checkInventory",
      "Next": "ProcessPayment",
      "Catch": [
        {
          "ErrorEquals": ["OutOfStockError"],
          "Next": "NotifyOutOfStock"
        }
      ]
    },
    "ProcessPayment": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:processPayment",
      "Next": "FulfillOrder"
    },
    "FulfillOrder": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:fulfillOrder",
      "End": true
    },
    "NotifyOutOfStock": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:notifyUser",
      "End": true
    }
  }
}

Serverless Tradeoffs

Benefit Tradeoff
No server management Less control over environment
Auto-scaling Cold start latency (100ms-few seconds)
Pay-per-execution Expensive for constant high load
Built-in HA Vendor lock-in
Rapid development Debugging/testing more complex
Event-driven simplicity Limited execution time (15 min max)
Cost Comparison: At low traffic, serverless is cheaper. But at ~1M requests/day with 200ms execution, EC2 often becomes more economical. Calculate your break-even point!

When to Use Serverless

  • ? Variable/unpredictable traffic
  • ? Event-driven workloads
  • ? Short-running tasks (< 15 minutes)
  • ? Rapid prototyping
  • ? Background jobs (image processing, data ETL)
  • ? Long-running processes
  • ? Consistent high-throughput workloads
  • ? Latency-sensitive applications (cold starts)

Next Steps

Microservices Architecture Plan Generator

Design your microservices decomposition with bounded contexts, communication patterns, and deployment topology. Download as Word, Excel, PDF, or PowerPoint.

Draft auto-saved

All data stays in your browser. Nothing is sent to or stored on any server.

Technology