Part 19: Container Orchestration Readiness

From Single Host to Orchestration

Docker Compose gives you declarative multi-container management — but only on a single machine. When that machine runs out of CPU, memory, or disk, your application has nowhere to go. When it crashes, every service goes down. Production demands more.

What Orchestration Solves

Single Host vs Orchestrated Cluster

flowchart TB
    subgraph SingleHost["Single Host (Docker Compose)"]
        direction TB
        HOST["One Machine"]
        HOST --> C1["webapp"]
        HOST --> C2["database"]
        HOST --> C3["cache"]
        HOST -->|"Single point
of failure"| FAIL["❌ Host dies =
Everything dies"]
    end

    subgraph Orchestrated["Orchestrated Cluster (Kubernetes)"]
        direction TB
        SCHED["Scheduler"]
        SCHED --> N1["Node 1"]
        SCHED --> N2["Node 2"]
        SCHED --> N3["Node 3"]
        N1 --> C4["webapp-1"]
        N1 --> C5["cache-1"]
        N2 --> C6["webapp-2"]
        N2 --> C7["database"]
        N3 --> C8["webapp-3"]
        N3 --> C9["cache-2"]
        N2 -->|"Node 2 dies"| HEAL["✅ Scheduler moves
workloads to Node 1/3"]
    end

Container orchestrators solve five fundamental problems:

Problem	Docker Alone	With Orchestrator
Scheduling	Manual placement on one host	Auto-places containers on nodes with capacity
Scaling	Manual `docker compose up --scale`	Auto-scales based on CPU/memory/custom metrics
Self-healing	restart: always (same host only)	Reschedules to healthy nodes, replaces failed pods
Service discovery	Docker DNS (single network)	Cluster-wide DNS, load balancing, traffic routing
Rolling updates	Stop old, start new (downtime)	Zero-downtime rolling deployments with rollback

                            
                            Key Insight: Orchestration doesn't just run containers differently — it requires containers to be designed differently. A container that works perfectly under docker compose up may fail repeatedly under Kubernetes if it wasn't built with orchestration principles in mind.
                        

The Twelve-Factor App

The Twelve-Factor App methodology (Adam Wiggins, 2011) predates containers but perfectly describes container-native design. These principles ensure applications are portable, scalable, and orchestration-friendly.

The 12 Factors for Containers

#	Factor	Container Application	Criticality
I	Codebase	One repo per service, one image per repo	Medium
II	Dependencies	All deps in Dockerfile, no implicit host packages	High
III	Config	Environment variables, ConfigMaps, not baked into image	Critical
IV	Backing Services	Databases, caches as attached resources (URLs)	Critical
V	Build, Release, Run	Separate build (CI), release (tag), run (orchestrator)	High
VI	Processes	Stateless processes, share-nothing architecture	Critical
VII	Port Binding	Self-contained, expose via port (no external webserver)	High
VIII	Concurrency	Scale horizontally (add replicas, not bigger containers)	Critical
IX	Disposability	Fast startup, graceful shutdown (SIGTERM handling)	Critical
X	Dev/Prod Parity	Same image in dev/staging/prod, config differs	High
XI	Logs	Write to stdout/stderr, let platform aggregate	Critical
XII	Admin Processes	One-off tasks as separate containers (Jobs/CronJobs)	Medium

                            
                            Most Violated Factors: Factors III (Config), VI (Processes), IX (Disposability), and XI (Logs) are the most commonly violated in container deployments. If your containers bake configuration into images, store state locally, don't handle SIGTERM, or write logs to files — they will fail under orchestration.
                        

Stateless vs Stateful Containers

The single most important design decision for orchestration is whether your container is stateless (can be killed and replaced at any time) or stateful (holds data that must survive restarts). Orchestrators strongly prefer stateless containers because they can be freely scheduled, scaled, and replaced.

Stateless Architecture Pattern

flowchart LR
    subgraph Stateless["Stateless Tier (Scale Freely)"]
        W1["webapp-1"]
        W2["webapp-2"]
        W3["webapp-3"]
    end

    subgraph External["External State (Managed)"]
        DB[(PostgreSQL
Database)]
        CACHE[(Redis
Session Store)]
        S3[(Object Storage
Files/Uploads)]
    end

    LB["Load Balancer"] --> W1
    LB --> W2
    LB --> W3
    W1 --> DB
    W2 --> DB
    W3 --> DB
    W1 --> CACHE
    W2 --> CACHE
    W3 --> CACHE
    W1 --> S3
    W2 --> S3
    W3 --> S3

Aspect	Stateless Container	Stateful Container
Data persistence	No local data — externalized to DB/cache/S3	Requires persistent volumes
Scaling	Add/remove replicas freely	Complex (data migration, quorum)
Replacement	Kill and recreate instantly	Must drain, migrate, then replace
Scheduling	Any node in the cluster	Node with attached volume
Recovery time	Seconds (fresh start)	Minutes (volume reattach, replay)
Examples	API servers, web frontends, workers	Databases, message queues, caches
Kubernetes type	Deployment	StatefulSet

Externalizing state — the key to making containers stateless:

# Anti-pattern: storing sessions in container memory
# If this container dies, all user sessions are lost
docker run -e SESSION_STORE=memory myapp:latest

# Pattern: externalize sessions to Redis
# Container can be killed/replaced without losing sessions
docker run \
  -e SESSION_STORE=redis \
  -e REDIS_URL=redis://redis-cluster:6379 \
  myapp:latest

# Anti-pattern: storing uploads in container filesystem
# Files are lost when container is replaced
docker run -v /app/uploads myapp:latest

# Pattern: externalize files to object storage
docker run \
  -e UPLOAD_BACKEND=s3 \
  -e S3_BUCKET=myapp-uploads \
  -e S3_REGION=us-east-1 \
  myapp:latest

Container Design Principles for Orchestration

Beyond statelessness, orchestration-ready containers follow specific design principles that enable the orchestrator to manage their lifecycle effectively.

Single Process per Container

Each container should run one primary process. This makes health monitoring, logging, scaling, and resource allocation straightforward.

# Anti-pattern: multiple processes in one container
# Dockerfile that runs nginx + php-fpm + cron in one container
CMD ["supervisord", "-c", "/etc/supervisor.conf"]
# Problems: Can't scale web and cron independently,
# can't monitor health of individual services,
# logs are mixed, failure of one affects all

# Pattern: separate containers for separate concerns
# Container 1: nginx (reverse proxy)
# Container 2: php-fpm (application)  
# Container 3: cron (scheduled tasks)
# Each can scale, restart, and be monitored independently

Immutable Infrastructure

# Anti-pattern: mutable container (SSH in and fix things)
docker exec -it production-web bash
apt-get update && apt-get install -y hotfix-package
# Container is now different from its image
# Next deployment reverts the fix
# No audit trail of what changed

# Pattern: immutable containers
# Fix the Dockerfile, build a new image, deploy
# Every container matches its image exactly
# Rollback = deploy previous image tag

Fast Startup

# Measure container startup time
time docker run --rm myapp:latest echo "ready"
# Target: under 5 seconds for web services
# Target: under 30 seconds for complex services (JVM, ML models)

# Strategies for fast startup:
# 1. Small images (Alpine, distroless) — less to pull/extract
# 2. Pre-compile/pre-build in image (don't compile at startup)
# 3. Lazy initialization (connect to DB on first request, not at boot)
# 4. Health probe start_period to handle warm-up

Health Probes

Orchestrators don't just check if a process is running — they use health probes to determine if a container is functioning correctly. Without probes, the orchestrator can't make intelligent decisions about routing traffic or restarting unhealthy instances.

Probe Type	Purpose	Failure Action	When to Use
Liveness	Is the process alive and not deadlocked?	Kill and restart container	Always (detect stuck processes)
Readiness	Is it ready to serve traffic?	Remove from load balancer (don't kill)	Services with warm-up or dependencies
Startup	Has initial boot completed?	Give more time (protect slow starters)	JVM apps, ML model loading

// Express.js health endpoint implementation
const express = require('express');
const app = express();

let isReady = false;
let isHealthy = true;

// Simulate initialization (DB connection, cache warm-up)
async function initialize() {
    await connectToDatabase();
    await warmUpCache();
    isReady = true;
    console.log('Application ready to serve traffic');
}

// Liveness probe: Am I alive and not deadlocked?
// Should be LIGHTWEIGHT — no external dependencies
app.get('/healthz', (req, res) => {
    if (isHealthy) {
        res.status(200).json({ status: 'alive' });
    } else {
        res.status(503).json({ status: 'unhealthy' });
    }
});

// Readiness probe: Can I serve traffic right now?
// Check dependencies: DB connected? Cache available?
app.get('/readyz', async (req, res) => {
    if (!isReady) {
        return res.status(503).json({ status: 'initializing' });
    }
    
    try {
        await db.query('SELECT 1');  // Quick DB check
        res.status(200).json({ status: 'ready' });
    } catch (err) {
        // Temporarily not ready — don't kill, just stop traffic
        res.status(503).json({ status: 'not ready', error: err.message });
    }
});

initialize();
app.listen(3000);

# Kubernetes probe configuration (for reference)
# This is what Docker healthchecks translate to in K8s
apiVersion: v1
kind: Pod
spec:
  containers:
    - name: webapp
      image: myapp:latest
      ports:
        - containerPort: 3000
      
      livenessProbe:
        httpGet:
          path: /healthz
          port: 3000
        initialDelaySeconds: 5
        periodSeconds: 10
        failureThreshold: 3      # Kill after 3 failures
      
      readinessProbe:
        httpGet:
          path: /readyz
          port: 3000
        initialDelaySeconds: 5
        periodSeconds: 5
        failureThreshold: 2      # Remove from LB after 2 failures
      
      startupProbe:
        httpGet:
          path: /healthz
          port: 3000
        failureThreshold: 30     # Allow up to 30 * 10s = 5 min to start
        periodSeconds: 10

                            
                            Critical Rule: Liveness probes must NEVER check external dependencies (database, cache, API). If your liveness probe fails because the database is down, the orchestrator kills your container — which doesn't fix the database. Use readiness probes for dependency checks.
                        

Resource Requests & Limits

Orchestrators need to know how much CPU and memory your container requires to schedule it on an appropriate node. Without resource specifications, the scheduler can overcommit nodes, leading to OOM kills and CPU starvation.

# Docker resource flags (translate to Kubernetes resource spec)
# Memory limit: container is OOM-killed if it exceeds this
docker run --memory=256m myapp:latest

# CPU limit: container is throttled (not killed) when exceeding
docker run --cpus=1.0 myapp:latest

# Memory reservation (soft limit): scheduler hint
docker run --memory-reservation=128m --memory=256m myapp:latest

# Connection to cgroups (Part 3):
# These flags set cgroup limits in /sys/fs/cgroup/
# memory.max = --memory value
# cpu.max = --cpus value * 100000 (period)

# Kubernetes resource specification
# requests = guaranteed minimum (used for scheduling)
# limits = maximum allowed (enforced by cgroups)
apiVersion: v1
kind: Pod
spec:
  containers:
    - name: webapp
      image: myapp:latest
      resources:
        requests:
          memory: "128Mi"   # Scheduler ensures node has 128Mi available
          cpu: "250m"       # 250 millicores = 0.25 CPU cores
        limits:
          memory: "256Mi"   # OOM-killed if exceeds 256Mi
          cpu: "1000m"      # Throttled (not killed) above 1 CPU core

Quality of Service (QoS) Classes — determined by how you set requests and limits:

QoS Class	Condition	Eviction Priority	Best For
Guaranteed	requests = limits (both CPU and memory)	Last to be evicted	Critical services (databases, payment)
Burstable	requests < limits (at least one set)	Evicted after BestEffort	Most production workloads
BestEffort	No requests or limits set	First to be evicted	Batch jobs, development only

                            
                            Sizing Strategy: Start with generous limits, observe actual usage with monitoring (Part 20), then right-size. For memory: set requests to P95 usage and limits to P99. For CPU: set requests to average usage and limits to burst capacity. Use docker stats or cAdvisor to measure.
                        

Graceful Shutdown & Signal Handling

When an orchestrator needs to stop a container (scaling down, rolling update, node drain), it sends SIGTERM and waits a grace period (default 30 seconds) before sending SIGKILL. Containers that handle SIGTERM properly finish in-flight requests and close connections cleanly. Those that don't cause dropped connections and data corruption.

Container Shutdown Sequence

sequenceDiagram
    participant K as Orchestrator
    participant C as Container (PID 1)
    participant LB as Load Balancer

    K->>LB: Remove container from service endpoints
    K->>C: Send SIGTERM
    Note over C: Grace period starts (30s default)
    C->>C: Stop accepting new connections
    C->>C: Finish in-flight requests
    C->>C: Close database connections
    C->>C: Flush buffers/caches
    C->>K: Exit code 0 (clean shutdown)
    Note over K: If still running after grace period:
    K->>C: Send SIGKILL (force kill)

// Node.js: Proper SIGTERM handling
const http = require('http');

const server = http.createServer((req, res) => {
    // Normal request handling
    res.writeHead(200);
    res.end('Hello World');
});

server.listen(3000);

// Graceful shutdown on SIGTERM
process.on('SIGTERM', () => {
    console.log('SIGTERM received. Starting graceful shutdown...');
    
    // Stop accepting new connections
    server.close(() => {
        console.log('HTTP server closed. All connections drained.');
        
        // Close database connections
        db.end().then(() => {
            console.log('Database connections closed.');
            process.exit(0);  // Clean exit
        });
    });
    
    // Force exit after timeout (safety net)
    setTimeout(() => {
        console.error('Forced shutdown after timeout');
        process.exit(1);
    }, 25000);  // 25s < 30s grace period
});

#!/usr/bin/env python3
# Python: Proper SIGTERM handling with asyncio
import signal
import asyncio
from aiohttp import web

app = web.Application()
runner = None

async def handle(request):
    return web.Response(text="Hello World")

app.router.add_get('/', handle)

async def shutdown(signal_received, loop):
    """Graceful shutdown handler."""
    print(f'Received {signal_received.name}. Shutting down...')
    
    # Stop accepting new connections
    if runner:
        await runner.cleanup()
    
    # Cancel all running tasks
    tasks = [t for t in asyncio.all_tasks() if t is not asyncio.current_task()]
    for task in tasks:
        task.cancel()
    
    await asyncio.gather(*tasks, return_exceptions=True)
    loop.stop()

async def main():
    global runner
    loop = asyncio.get_event_loop()
    
    # Register signal handlers
    for sig in (signal.SIGTERM, signal.SIGINT):
        loop.add_signal_handler(sig, lambda s=sig: asyncio.create_task(shutdown(s, loop)))
    
    runner = web.AppRunner(app)
    await runner.setup()
    site = web.TCPSite(runner, '0.0.0.0', 8080)
    await site.start()
    print('Server started on port 8080')
    
    # Run forever until signal received
    await asyncio.Event().wait()

asyncio.run(main())

The PID 1 Problem

In Linux, PID 1 has special signal handling: it only receives signals it explicitly registers handlers for. If your application isn't PID 1 (wrapped in a shell script), SIGTERM may not reach it.

# BAD: Shell wraps your app — SIGTERM goes to shell, not app
CMD node server.js
# Docker actually runs: /bin/sh -c "node server.js"
# PID 1 = /bin/sh, PID 2 = node
# SIGTERM hits sh, which doesn't forward to node

# GOOD: Exec form — your app IS PID 1
CMD ["node", "server.js"]
# PID 1 = node server.js
# SIGTERM goes directly to your application

# ALTERNATIVE: Use tini as init process (handles signal forwarding)
RUN apk add --no-cache tini
ENTRYPOINT ["/sbin/tini", "--"]
CMD ["node", "server.js"]
# PID 1 = tini, PID 2 = node
# tini forwards SIGTERM to child processes correctly

Configuration Management

Factor III (Config) states: store config in the environment, not in code. For orchestration, this means the same image deploys to dev, staging, and production — only the configuration changes.

# Configuration hierarchy for orchestrated containers
# Priority (highest to lowest):
# 1. Secrets (sensitive: passwords, API keys, certificates)
# 2. Environment variables (non-sensitive runtime config)
# 3. ConfigMaps/config files (mounted as volumes)
# 4. Defaults baked into the application code

# Kubernetes ConfigMap example
apiVersion: v1
kind: ConfigMap
metadata:
  name: webapp-config
data:
  LOG_LEVEL: "info"
  MAX_CONNECTIONS: "100"
  FEATURE_FLAG_NEW_UI: "true"
  config.yaml: |
    server:
      port: 8080
      read_timeout: 30s
    cache:
      ttl: 300s
      max_size: 1000

# Why NOT to bake config into images:
# 1. Different environments need different values
#    dev: DATABASE_URL=postgres://localhost/dev
#    prod: DATABASE_URL=postgres://rds.amazonaws.com/prod

# 2. Config changes shouldn't require rebuilding
#    Changing a feature flag should be instant, not a 5-min CI pipeline

# 3. Secrets in images are visible to anyone with pull access
#    docker inspect reveals all ENV values set during build

# 4. Same image everywhere = confidence that code is identical
#    "It works in staging" actually means something

# Pattern: application reads config at startup
docker run \
  -e DATABASE_URL=postgres://prod-db:5432/myapp \
  -e REDIS_URL=redis://prod-cache:6379 \
  -e LOG_LEVEL=warn \
  -v /etc/app/config.yaml:/app/config.yaml:ro \
  myapp:latest   # Same image used in dev, staging, prod

Logging for Orchestration

In orchestrated environments, containers are ephemeral — they start, stop, move, and get replaced. Logs written to files inside containers are lost when the container is destroyed. The standard is to write to stdout/stderr and let the platform handle collection, aggregation, and storage.

# Anti-pattern: writing logs to files inside container
# These logs disappear when the container is killed
CMD ["node", "server.js", "--log-file=/var/log/app.log"]
# docker logs shows nothing, logs are trapped in the container

# Pattern: write to stdout/stderr
CMD ["node", "server.js"]
# docker logs shows everything
# Kubernetes automatically collects from stdout/stderr
# Log aggregators (ELK, Loki, CloudWatch) scrape container stdout

// Structured logging (JSON format) — machine-parseable
{"timestamp":"2026-05-14T10:30:00Z","level":"info","service":"webapp","msg":"Request processed","method":"GET","path":"/api/users","status":200,"duration_ms":45,"request_id":"abc-123"}
{"timestamp":"2026-05-14T10:30:01Z","level":"error","service":"webapp","msg":"Database connection failed","error":"connection refused","retry_count":3,"request_id":"def-456"}
{"timestamp":"2026-05-14T10:30:02Z","level":"warn","service":"webapp","msg":"Rate limit approaching","client_ip":"10.0.1.5","requests_per_minute":95,"limit":100}

// Structured logging implementation (Node.js with pino)
const pino = require('pino');

const logger = pino({
    level: process.env.LOG_LEVEL || 'info',
    // JSON output to stdout — orchestrator collects it
    formatters: {
        level: (label) => ({ level: label }),
    },
    base: {
        service: 'webapp',
        version: process.env.APP_VERSION || 'unknown',
        environment: process.env.NODE_ENV || 'development',
    },
});

// Usage
logger.info({ method: 'GET', path: '/users', duration_ms: 45 }, 'Request processed');
logger.error({ err, request_id: req.id }, 'Database query failed');

                            
                            Why Structured Logging? In a cluster with hundreds of containers, you need to filter logs by service, request ID, error level, and time range. JSON logs enable powerful queries in aggregation platforms: "Show me all ERROR logs from the webapp service in the last hour with request_id=abc-123."
                        

Service Discovery Patterns

When containers are scheduled across multiple nodes and can be started/stopped/moved at any time, hardcoded addresses are impossible. Service discovery enables containers to find each other dynamically.

DNS-Based Service Discovery (Kubernetes)

flowchart LR
    subgraph ClusterDNS["Cluster DNS (CoreDNS)"]
        DNS["webapp.default.svc.cluster.local
→ ClusterIP 10.96.0.50"]
    end

    subgraph Service["Kubernetes Service (Load Balancer)"]
        SVC["ClusterIP: 10.96.0.50
Port: 80"]
    end

    subgraph Pods["Backend Pods"]
        P1["webapp-abc
10.244.1.5:3000"]
        P2["webapp-def
10.244.2.8:3000"]
        P3["webapp-ghi
10.244.3.2:3000"]
    end

    CLIENT["Other Container"] -->|"GET http://webapp:80"| DNS
    DNS --> SVC
    SVC -->|round-robin| P1
    SVC -->|round-robin| P2
    SVC -->|round-robin| P3

Pattern	How It Works	Used By
DNS-based	Service name resolves to virtual IP or pod IPs	Kubernetes Services, Docker Compose
Environment-based	Injected env vars with service addresses	Docker links (deprecated), K8s env
Registry-based	Services register in Consul/etcd, clients query	HashiCorp Consul, Netflix Eureka

# In Kubernetes, every Service gets a DNS name:
# Format: ..svc.cluster.local

# From any pod in the same namespace:
curl http://webapp:80/api/users        # Short name works within namespace
curl http://webapp.default.svc.cluster.local:80/api/users  # Fully qualified

# Cross-namespace communication:
curl http://database.data-tier.svc.cluster.local:5432

# Headless service (returns pod IPs directly, not ClusterIP):
# Useful for stateful services where clients need specific pods
nslookup database-headless.default.svc.cluster.local
# Returns: 10.244.1.5, 10.244.2.8, 10.244.3.2 (individual pod IPs)

Container Anti-Patterns for Orchestration

These patterns work fine on a single Docker host but break catastrophically when containers are orchestrated across a cluster:

Anti-Pattern	Why It Fails	Correct Pattern
SSH into containers	Containers are ephemeral; changes are lost on restart	Debug with `kubectl exec`, fix in Dockerfile and redeploy
Manual scaling	Can't react to load spikes fast enough	Horizontal Pod Autoscaler (HPA) based on metrics
Pets (named, irreplaceable)	Can't be scheduled elsewhere or replaced	Cattle (numbered, disposable, identical)
Hardcoded IPs/hosts	Pods get new IPs on every restart	Service discovery via DNS names
Local file storage	Files lost when pod moves to another node	PersistentVolumes or object storage (S3)
In-memory sessions	Lost on pod replacement; sticky sessions break scaling	Redis/Memcached session store
Long startup time	Slow scaling, slow recovery from failures	Optimize image size, lazy init, startup probes
No health endpoints	Orchestrator can't detect unhealthy instances	Implement /healthz and /readyz endpoints
Running as root	Security risk, violates Pod Security Standards	Non-root user, read-only filesystem
Ignoring SIGTERM	Connections dropped during rolling updates	Handle SIGTERM, drain connections gracefully
Fat containers	Slow pulls, waste cluster resources	Multi-stage builds, distroless, <50MB images
Latest tag	Non-reproducible deployments, no rollback	Immutable tags (git SHA, semver)

Mental Model

Pets vs Cattle

Pets are servers you name, care for, and nurse back to health when they're sick (e.g., "db-master-1" that's been running for 3 years). Cattle are servers that are numbered, identical, and replaced when they fail (e.g., "webapp-7f8d9-xk2p4" — if it's unhealthy, kill it and start a new one).

Orchestration requires the cattle mindset: any container can be killed at any time and a fresh replacement will take its place. If your container requires manual intervention to recover, it's a pet — and it will be a constant source of incidents in production.

pets-vs-cattle immutable-infrastructure disposability

Docker Swarm vs Kubernetes

Two primary orchestrators exist in the container ecosystem. Understanding their trade-offs helps you choose the right platform for your workloads.

Dimension	Docker Swarm	Kubernetes
Complexity	Simple — `docker swarm init` and done	Complex — multi-component control plane
Learning curve	Hours (if you know Docker)	Weeks to months
Scaling	Hundreds of nodes	Thousands of nodes (tested at 5,000+)
Auto-scaling	Manual replica count	HPA, VPA, Cluster Autoscaler
Networking	Overlay network, ingress routing mesh	CNI plugins, Ingress controllers, service mesh
Storage	Basic volume plugins	CSI drivers, dynamic provisioning, StatefulSets
Configuration	Docker configs and secrets	ConfigMaps, Secrets, external operators
Ecosystem	Limited third-party tools	Massive ecosystem (Helm, Operators, service mesh)
CI/CD integration	Basic (docker stack deploy)	GitOps (ArgoCD, Flux), Helm, Kustomize
Multi-tenancy	Limited	Namespaces, RBAC, Network Policies
Managed offerings	Few (Docker Enterprise deprecated)	EKS, AKS, GKE, and dozens more
Best for	Small teams, simple apps, Docker-native orgs	Any scale, complex workloads, enterprise

# Docker Swarm: Initialize and deploy (simple)
docker swarm init
docker stack deploy -c compose.yaml myapp
docker service scale myapp_webapp=5

# Kubernetes: Deploy (more verbose but more powerful)
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
kubectl autoscale deployment webapp --min=2 --max=10 --cpu-percent=70

                            
                            Recommendation: If you're a small team with straightforward web applications and want minimal operational overhead, Docker Swarm is a pragmatic choice. If you need auto-scaling, complex networking, a rich ecosystem, or you're operating at scale — Kubernetes is the industry standard. The good news: containers built with the principles in this article work well on both platforms.
                        

Exercises

                            
                            Exercise 1: Take an existing containerised application and make it stateless. Identify all local state (sessions, uploads, cache) and externalize it to appropriate backing services (Redis, S3, database). Verify by running 3 replicas behind a load balancer — each should serve any request correctly.
                        

                            
                            Exercise 2: Implement proper health endpoints (/healthz and /readyz) in a web application. The readiness probe should check database connectivity. Simulate a database failure and observe that the readiness probe fails while liveness remains healthy.
                        

                            
                            Exercise 3: Implement graceful SIGTERM handling in your preferred language. The application should: stop accepting new connections, finish in-flight requests (with a timeout), close database connections, then exit cleanly. Test with docker stop and verify no requests are dropped using a load testing tool.
                        

                            
                            Exercise 4: Configure resource requests and limits for a container. Run a stress test to exceed the memory limit and observe OOM behavior. Then set CPU limits and observe throttling with docker stats. Document how QoS class affects eviction priority.
                        

                            
                            Exercise 5: Audit an existing application against all 12 factors. Create a checklist scoring each factor as "compliant", "partial", or "non-compliant". Then fix the top 3 most critical violations (Config, Processes, Disposability).
                        

Conclusion & Next Steps

Container orchestration readiness isn't about learning Kubernetes commands — it's about designing containers that thrive in a dynamic, distributed environment. The principles we've covered form a checklist for orchestration-ready containers:

Stateless design — externalize all state to backing services
Twelve-factor compliance — config via environment, logs to stdout, single-process, disposable
Health probes — liveness (am I alive?), readiness (can I serve?), startup (am I done booting?)
Resource declarations — requests for scheduling, limits for protection
Graceful shutdown — handle SIGTERM, drain connections, exit cleanly within grace period
Structured logging — JSON to stdout for aggregation and querying
Service discovery — use DNS names, never hardcode addresses
Immutable images — same image everywhere, config changes externally

With these principles in place, your containers are ready for Kubernetes, Docker Swarm, AWS ECS, Google Cloud Run, or any orchestration platform. The next step is making these running containers observable — so you know what's happening inside your distributed system.

Next in the Series

In Part 20: Container Monitoring & Observability, we'll instrument containers with metrics (Prometheus, cAdvisor), structured logging (ELK, Loki), and distributed tracing — turning opaque containers into transparent, debuggable systems.

Previous Part 18: Docker Compose Mastery Next Part 20: Container Monitoring & Observability

Cookie Consent

Part 19: Container Orchestration Readiness

Table of Contents

From Single Host to Orchestration

What Orchestration Solves

The Twelve-Factor App

The 12 Factors for Containers

Stateless vs Stateful Containers

Container Design Principles for Orchestration

Single Process per Container

Immutable Infrastructure

Fast Startup

Health Probes

Resource Requests & Limits

Graceful Shutdown & Signal Handling

The PID 1 Problem

Configuration Management

Logging for Orchestration

Service Discovery Patterns

Container Anti-Patterns for Orchestration

Pets vs Cattle

Docker Swarm vs Kubernetes

Exercises

Conclusion & Next Steps

Next in the Series

Cookie Consent

Part 19: Container Orchestration Readiness

Table of Contents

From Single Host to Orchestration

What Orchestration Solves

The Twelve-Factor App

The 12 Factors for Containers

Stateless vs Stateful Containers

Container Design Principles for Orchestration

Single Process per Container

Immutable Infrastructure

Fast Startup

Health Probes

Resource Requests & Limits

Graceful Shutdown & Signal Handling

The PID 1 Problem

Configuration Management

Logging for Orchestration

Service Discovery Patterns

Container Anti-Patterns for Orchestration

Pets vs Cattle

Docker Swarm vs Kubernetes

Exercises

Conclusion & Next Steps

Next in the Series

Continue the Series

Part 18: Docker Compose Mastery

Part 3: Control Groups (cgroups)

Part 20: Container Monitoring & Observability