Module 12: API Architecture
An API is a contract. It defines what a service promises to deliver, what inputs it expects, and what guarantees it provides about reliability and performance. Choosing the right API style isn't about technology preference — it's about matching the communication pattern to the problem domain.
REST & the Richardson Maturity Model
REST (Representational State Transfer) models your API around resources — nouns, not verbs. You don't "createUser" — you POST to /users. The HTTP method carries the intent; the URL identifies the resource.
Richardson Maturity Model measures how "RESTful" an API actually is:
- Level 0 — The Swamp of POX: Single endpoint, all operations via POST with XML/JSON body. Essentially RPC over HTTP.
- Level 1 — Resources: Individual URLs for resources (
/orders/123) but still using POST for everything. - Level 2 — HTTP Verbs: Proper use of GET, POST, PUT, DELETE, PATCH with correct status codes. Most production APIs stop here.
- Level 3 — HATEOAS: Responses include hypermedia links to related actions/resources. Self-describing API — clients discover capabilities dynamically.
REST Design Principles:
- Stateless: Every request contains all information needed to process it — no server-side sessions
- Cacheable: Responses declare cacheability via headers (ETag, Cache-Control)
- Uniform Interface: Consistent URL patterns, HTTP methods, and response formats
- Resource-based: URLs represent nouns (
/orders), not actions (/getOrders)
gRPC & Protocol Buffers
gRPC uses Protocol Buffers (protobuf) for serialization and HTTP/2 for transport. It's designed for high-performance service-to-service communication where REST's text-based overhead is unacceptable.
A protobuf service definition:
// order_service.proto — gRPC service definition
syntax = "proto3";
package commerce.orders.v1;
option go_package = "github.com/myorg/commerce/orders/v1";
// The Order service handles order lifecycle operations
service OrderService {
// Create a new order (unary RPC)
rpc CreateOrder(CreateOrderRequest) returns (CreateOrderResponse);
// Stream order status updates (server streaming)
rpc WatchOrderStatus(WatchOrderRequest) returns (stream OrderStatusUpdate);
// Batch upload order items (client streaming)
rpc UploadOrderItems(stream OrderItem) returns (UploadSummary);
// Real-time order negotiation (bidirectional streaming)
rpc NegotiateOrder(stream NegotiationMessage) returns (stream NegotiationMessage);
}
message CreateOrderRequest {
string customer_id = 1;
repeated OrderItem items = 2;
ShippingAddress shipping_address = 3;
PaymentMethod payment_method = 4;
}
message OrderItem {
string product_id = 1;
int32 quantity = 2;
int64 unit_price_cents = 3; // Use cents to avoid float issues
}
message CreateOrderResponse {
string order_id = 1;
OrderStatus status = 2;
int64 total_cents = 3;
google.protobuf.Timestamp created_at = 4;
}
enum OrderStatus {
ORDER_STATUS_UNSPECIFIED = 0;
ORDER_STATUS_PENDING = 1;
ORDER_STATUS_CONFIRMED = 2;
ORDER_STATUS_SHIPPED = 3;
ORDER_STATUS_DELIVERED = 4;
ORDER_STATUS_CANCELLED = 5;
}
Why gRPC over REST?
- Performance: Binary serialization (protobuf) is 3-10× smaller and faster than JSON
- Streaming: HTTP/2 enables server streaming, client streaming, and bidirectional streaming natively
- Code generation: Protobuf definitions generate type-safe client/server code in 12+ languages
- Contract-first: The .proto file IS the contract — no ambiguity about field types or optionality
When NOT to use gRPC:
- Browser clients (limited gRPC-Web support, needs proxy)
- Public APIs (developers expect REST + JSON for easy testing with curl)
- Simple CRUD services where REST's simplicity wins
GraphQL: Client-Driven Queries
GraphQL inverts the API power dynamic: instead of the server defining what data each endpoint returns, the client specifies exactly what it needs. One endpoint, infinite query shapes.
# GraphQL query — client requests exactly what it needs
{
"query": "query GetOrderDetails($orderId: ID!) { order(id: $orderId) { id status totalAmount customer { name email } items { product { name price } quantity } shipping { estimatedDelivery carrier trackingNumber } } }",
"variables": {
"orderId": "order_12345"
}
}
GraphQL strengths:
- No over-fetching: Mobile clients request 3 fields; web dashboards request 30 fields — same endpoint
- No under-fetching: Get related data in one request (order + customer + items) instead of 3 REST calls
- Introspection: Clients can query the schema itself — enables tooling, documentation, and auto-completion
- Strong typing: Schema defines all types, relationships, and valid operations at compile time
orders { items { product { name } } } will execute one query to get orders, then N queries to get items, then N×M queries to get products. Use DataLoader (batching) or JOIN-based resolvers to avoid this. Without batching, GraphQL can be slower than equivalent REST endpoints.
flowchart TD
START["Choose API Style"]
Q1{"Who's the client?"}
Q2{"Performance critical?"}
Q3{"Data needs vary by client?"}
REST["REST
Public APIs, CRUD,
browser clients"]
GRPC["gRPC
Internal services,
streaming, high throughput"]
GQL["GraphQL
Mobile + Web with
different data needs"]
START --> Q1
Q1 -->|"External / Public"| REST
Q1 -->|"Internal services"| Q2
Q1 -->|"Multiple client types"| Q3
Q2 -->|"Yes — low latency"| GRPC
Q2 -->|"No — standard"| REST
Q3 -->|"Yes — varied needs"| GQL
Q3 -->|"No — uniform"| REST
style REST fill:#e8f4f4,stroke:#3B9797,color:#132440
style GRPC fill:#f0f4f8,stroke:#16476A,color:#132440
style GQL fill:#fdf0f0,stroke:#BF092F,color:#132440
API Gateway Patterns
An API gateway is the single entry point for all client requests. It sits between clients and backend services, handling cross-cutting concerns that no individual service should own.
flowchart LR
CLIENT["Clients
(Web, Mobile, Partners)"]
subgraph GW["API Gateway"]
AUTH["Authentication
& Authorization"]
RL["Rate Limiting
& Throttling"]
CB["Circuit Breaker
& Retry"]
ROUTE["Request Routing
& Load Balancing"]
TRANSFORM["Request/Response
Transformation"]
CACHE["Response
Caching"]
end
subgraph SERVICES["Backend Services"]
S1["Order Service"]
S2["User Service"]
S3["Payment Service"]
S4["Inventory Service"]
end
CLIENT --> AUTH
AUTH --> RL
RL --> CB
CB --> ROUTE
ROUTE --> TRANSFORM
TRANSFORM --> CACHE
CACHE --> S1
CACHE --> S2
CACHE --> S3
CACHE --> S4
style AUTH fill:#e8f4f4,stroke:#3B9797,color:#132440
style RL fill:#e8f4f4,stroke:#3B9797,color:#132440
style CB fill:#e8f4f4,stroke:#3B9797,color:#132440
style ROUTE fill:#e8f4f4,stroke:#3B9797,color:#132440
style TRANSFORM fill:#e8f4f4,stroke:#3B9797,color:#132440
style CACHE fill:#e8f4f4,stroke:#3B9797,color:#132440
Gateway responsibilities:
- Authentication & Authorization: Validate JWT tokens, API keys, OAuth scopes before requests reach services
- Rate Limiting: Protect services from traffic spikes (per client, per endpoint, global)
- Circuit Breaking: Stop sending requests to failing services — fail fast instead of cascading
- Request Routing: Route to correct service version, handle canary deployments
- Response Caching: Cache idempotent GET responses at the edge
- Protocol Translation: Accept REST from external clients, translate to gRPC for internal services
A typical gateway configuration (Kong/NGINX-style):
# API Gateway configuration — Kong declarative config
_format_version: "3.0"
services:
- name: order-service
url: http://order-service.internal:8080
routes:
- name: orders-route
paths:
- /api/v1/orders
methods:
- GET
- POST
- PUT
strip_path: false
plugins:
# Rate limiting: 100 requests per minute per consumer
- name: rate-limiting
config:
minute: 100
policy: redis
redis_host: redis.internal
redis_port: 6379
fault_tolerant: true
hide_client_headers: false
# JWT authentication
- name: jwt
config:
secret_is_base64: false
claims_to_verify:
- exp
# Circuit breaker via upstream health checks
- name: proxy-cache
config:
response_code:
- 200
request_method:
- GET
content_type:
- application/json
cache_ttl: 30
strategy: memory
API Versioning & Rate Limiting
Versioning Strategies:
- URL Path:
/api/v1/orders,/api/v2/orders— most common, explicit, easy to route - Header:
Accept: application/vnd.myapi.v2+json— cleaner URLs but harder to test in browser - Query Parameter:
/api/orders?version=2— simple but pollutes the URL - Content Negotiation:
Accept: application/json; version=2— standards-compliant but rarely used
Rate Limiting protects your services and ensures fair usage. Test it:
# Test rate limiting with sequential requests
echo "Sending 110 requests to test rate limit (100/min)..."
for i in $(seq 1 110); do
STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
-H "Authorization: Bearer $API_TOKEN" \
"https://api.example.com/v1/orders")
if [ "$STATUS" = "429" ]; then
echo "Request $i: RATE LIMITED (429 Too Many Requests)"
# Check retry-after header
curl -s -I -H "Authorization: Bearer $API_TOKEN" \
"https://api.example.com/v1/orders" | grep -i "retry-after"
break
else
echo "Request $i: OK ($STATUS)"
fi
done
# Check rate limit headers in response
echo ""
echo "Rate limit headers:"
curl -s -I -H "Authorization: Bearer $API_TOKEN" \
"https://api.example.com/v1/orders" | grep -i "x-ratelimit"
Module 13: Cloud-Native Architecture
Cloud-native isn't "running on the cloud." It's building systems that exploit the cloud's unique properties — elastic resources, managed services, global distribution, and pay-per-use economics. The principles emerged from organizations that succeeded (and failed) at cloud-first design.
Immutable Infrastructure: Never Patch, Always Replace
Traditional infrastructure is mutable: you SSH into a server, install updates, change configs. Over time, servers drift — "configuration drift" means no two servers are identical, bugs are unreproducible, and "works on my machine" extends to "works on server-3 but not server-7."
Immutable infrastructure eliminates drift: never modify a running server. Replace it.
flowchart LR
subgraph BUILD["Build Phase (Once)"]
CODE["Source Code"]
IMG["Container Image
v2.3.1-sha-abc123"]
REG["Container Registry"]
end
subgraph DEPLOY["Deploy Phase (Replace)"]
OLD["Running v2.3.0
(3 replicas)"]
NEW["New v2.3.1
(3 replicas)"]
LB["Load Balancer"]
end
CODE -->|"docker build"| IMG
IMG -->|"docker push"| REG
REG -->|"kubectl apply"| NEW
OLD -->|"drain traffic"| LB
LB -->|"route to new"| NEW
OLD -->|"terminate"| GONE["Destroyed"]
style CODE fill:#e8f4f4,stroke:#3B9797,color:#132440
style IMG fill:#e8f4f4,stroke:#3B9797,color:#132440
style REG fill:#e8f4f4,stroke:#3B9797,color:#132440
style NEW fill:#f0f4f8,stroke:#16476A,color:#132440
style OLD fill:#fdf0f0,stroke:#BF092F,color:#132440
Principles:
- Servers are cattle, not pets. They have numbers, not names. If one is sick, replace it — don't nurse it back to health.
- The artifact is the image. A Docker image, AMI, or VM image built from source. Same image in dev, staging, and production.
- Rollback = deploy the previous image. No "undo the last 47 config changes" — just point to the previous known-good artifact.
- No SSH in production. If you need to debug, attach a debugger or ship logs. The moment you SSH and change something, you've broken immutability.
Declarative Over Imperative
Imperative: "Create a load balancer. Add server-1. Add server-2. Set health check interval to 30s." You describe the steps.
Declarative: "I want a load balancer with 2 servers and 30s health checks." You describe the desired state. The system figures out how to get there.
# Declarative Kubernetes deployment — desired state specification
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-service
labels:
app: order-service
version: v2.3.1
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: order-service
template:
metadata:
labels:
app: order-service
version: v2.3.1
spec:
containers:
- name: order-service
image: registry.example.com/order-service:v2.3.1-sha-abc123
ports:
- containerPort: 8080
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 10
periodSeconds: 15
readinessProbe:
httpGet:
path: /readyz
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: order-db-credentials
key: connection-string
Why declarative wins:
- Idempotent: Apply the same declaration 100 times — same result. No "ran the script twice and now I have duplicate resources"
- Self-healing: The system continuously reconciles actual state with desired state. If a pod crashes, Kubernetes recreates it
- Version controlled: Your infrastructure IS your code. Git history = infrastructure history
- Reviewable: Infrastructure changes go through pull requests just like application code
Elasticity: Scale to Zero, Scale to Millions
Cloud elasticity means your infrastructure matches your actual load — not your peak projection from 6 months ago. You pay for what you use, scale up during traffic spikes, and scale down (even to zero) during quiet periods.
Scaling dimensions:
- Horizontal (scale out): Add more instances. Stateless services scale horizontally trivially.
- Vertical (scale up): Bigger machines. Quick fix but has limits and requires downtime.
- Scale to zero: Serverless functions (Lambda, Cloud Run) run zero instances when idle — pay nothing until triggered.
- Auto-scaling triggers: CPU, memory, request queue depth, custom metrics (e.g., Kafka consumer lag).
The Twelve-Factor App
Twelve-Factor methodology (Adam Wiggins, Heroku, 2011) defines principles for building cloud-native applications that are portable, scalable, and operationally excellent. Every factor addresses a specific failure mode the authors observed across thousands of Heroku deployments.
The Twelve Factors — Complete Reference
| # | Factor | Principle | Anti-Pattern |
|---|---|---|---|
| I | Codebase | One codebase per app, tracked in version control, many deploys | Multiple apps sharing one repo with divergent branches |
| II | Dependencies | Explicitly declare and isolate dependencies | Relying on system packages or implicit global installs |
| III | Config | Store config in environment variables | Hardcoded connection strings, config files in repo |
| IV | Backing Services | Treat backing services as attached resources (swappable via URL) | Tight coupling to local filesystem paths or specific DB instances |
| V | Build, Release, Run | Strict separation between build, release, and run stages | Modifying code at runtime, building in production |
| VI | Processes | Execute as stateless processes; persist state in backing services | Sticky sessions, in-memory state without replication |
| VII | Port Binding | Export services via port binding (self-contained) | Requiring an external web server (Apache/Tomcat) to run |
| VIII | Concurrency | Scale out via the process model | Single monolithic process handling everything via threads |
| IX | Disposability | Maximize robustness with fast startup and graceful shutdown | 30-second startup times, losing in-flight requests on shutdown |
| X | Dev/Prod Parity | Keep dev, staging, and production as similar as possible | "Works on my machine" — SQLite in dev, PostgreSQL in prod |
| XI | Logs | Treat logs as event streams (stdout, not files) | Writing to local log files, custom log rotation |
| XII | Admin Processes | Run admin/management tasks as one-off processes | SSH into production to run migrations or data fixes |
flowchart TD
subgraph DEV["Development"]
CODE["Codebase
(I: Git repo)"]
DEPS["Dependencies
(II: package.json)"]
CONFIG["Config
(III: env vars)"]
end
subgraph BUILD["Build & Release"]
BLD["Build Stage
(V: compile + bundle)"]
REL["Release
(V: build + config)"]
end
subgraph RUN["Runtime"]
PROC["Stateless Processes
(VI: share-nothing)"]
PORT["Port Binding
(VII: self-contained)"]
CONC["Concurrency
(VIII: process types)"]
DISP["Disposability
(IX: fast start/stop)"]
end
subgraph OPS["Operations"]
LOGS["Logs as Streams
(XI: stdout)"]
ADMIN["Admin Processes
(XII: one-off tasks)"]
PARITY["Dev/Prod Parity
(X: same stack)"]
end
CODE --> BLD
DEPS --> BLD
BLD --> REL
CONFIG --> REL
REL --> PROC
PROC --> PORT
PORT --> CONC
CONC --> DISP
style CODE fill:#e8f4f4,stroke:#3B9797,color:#132440
style DEPS fill:#e8f4f4,stroke:#3B9797,color:#132440
style CONFIG fill:#e8f4f4,stroke:#3B9797,color:#132440
style BLD fill:#f0f4f8,stroke:#16476A,color:#132440
style REL fill:#f0f4f8,stroke:#16476A,color:#132440
style PROC fill:#e8f4f4,stroke:#3B9797,color:#132440
style PORT fill:#e8f4f4,stroke:#3B9797,color:#132440
style CONC fill:#e8f4f4,stroke:#3B9797,color:#132440
style DISP fill:#e8f4f4,stroke:#3B9797,color:#132440
Service Mesh Integration
A service mesh (Istio, Linkerd, Consul Connect) moves networking concerns out of application code and into infrastructure. Instead of every service implementing retries, circuit breakers, mTLS, and observability, the mesh handles it transparently via sidecar proxies.
What a service mesh provides:
- Mutual TLS (mTLS): Encrypted service-to-service communication without application code changes
- Traffic management: Canary releases, traffic splitting, fault injection for chaos testing
- Observability: Distributed tracing, metrics, and access logs for all service-to-service calls
- Resilience: Retries, timeouts, circuit breakers configured at the infrastructure layer
Case Studies
Stripe: API Design Excellence
Stripe's API-First Philosophy
Stripe is widely regarded as having one of the best-designed APIs in the industry. Their principles:
- Versioning via date:
Stripe-Version: 2026-05-01. Each API version is a snapshot — old versions remain supported indefinitely. No forced migrations. - Consistent resource naming: All resources follow the same URL pattern (
/v1/customers,/v1/charges). Nested resources use clear hierarchy (/v1/customers/:id/sources). - Idempotency keys: Every mutating request can include an
Idempotency-Keyheader. Retry the same request safely — Stripe returns the same response without processing again. - Expandable objects:
GET /v1/charges/ch_123?expand[]=customer— inline related resources in the response instead of requiring follow-up calls. - Rich error objects: Errors include type, code, message, param (which field failed), and documentation URL. No guessing what went wrong.
Impact: Stripe's developer experience is a competitive advantage. Integration takes hours instead of weeks. Their API design guide has become an industry reference.
Heroku: The Birth of Twelve-Factor
Learning from Thousands of Failed Deployments
Heroku hosted hundreds of thousands of applications by 2011. Adam Wiggins and the Heroku team observed recurring patterns in apps that scaled well versus those that broke constantly. The Twelve-Factor methodology was distilled from these observations.
Problems they solved:
- Config in code: Apps hardcoded database URLs. Moving between environments required code changes. Factor III (Config) fixed this with environment variables.
- Stateful processes: Apps stored session data in memory. When a dyno restarted, users lost their sessions. Factor VI (Processes) mandated stateless processes with external session stores.
- Slow startup: Apps with 60-second boot times couldn't scale quickly or recover from crashes. Factor IX (Disposability) demanded fast startup.
- Log files: Apps wrote to
/var/log/app.log. On ephemeral containers, those files disappear. Factor XI (Logs) redirected everything to stdout for external aggregation.
Legacy: Twelve-Factor became the foundation of Kubernetes design, Docker best practices, and every PaaS platform that followed. The principles are now so universal they're assumed rather than stated.
Conclusion & Next Steps
Modules 12 and 13 covered the external interface (APIs) and the deployment philosophy (cloud-native) of modern distributed systems.
The key takeaways:
- REST is the default for public APIs. Level 2 of Richardson Maturity (proper HTTP verbs + status codes) is sufficient. HATEOAS is elegant but rarely worth the implementation cost.
- gRPC wins for internal service-to-service communication. Binary serialization, streaming support, and code generation make it superior when you control both sides. Don't expose it to browsers.
- GraphQL solves the multiple-client-types problem. When mobile needs 3 fields and web needs 30, one flexible endpoint beats dozens of bespoke REST endpoints. But solve N+1 queries or you'll regret it.
- API gateways centralize cross-cutting concerns. Authentication, rate limiting, circuit breaking, and caching belong at the edge — not duplicated in every service.
- Immutability eliminates drift. Never patch a running server. Build an image, deploy it everywhere, roll back by deploying the previous image.
- Twelve-Factor methodology is cloud-native's constitution. Config in env vars, stateless processes, logs to stdout, fast startup — these principles underpin every successful cloud deployment.
Next in the Series
In Part 9: Scalability Fundamentals, we'll dive into the mechanics of scaling — horizontal vs vertical scaling strategies, load balancing algorithms, caching layers (CDN, application, database), and designing systems that handle traffic growth gracefully without rewriting.