Back to Containers & Runtime Environments Mastery Series

Part 19: Container Orchestration Readiness

May 14, 2026 Wasil Zafar 24 min read

Docker Compose manages containers on a single host — but production systems require scheduling across clusters, automatic scaling, self-healing, and zero-downtime deployments. Before adopting Kubernetes or any orchestrator, your containers must be designed for this world. This article establishes the design principles, patterns, and anti-patterns that determine whether your containers thrive or fail under orchestration.

Table of Contents

  1. From Single Host to Orchestration
  2. The Twelve-Factor App
  3. Stateless vs Stateful
  4. Container Design Principles
  5. Health Probes
  6. Resource Requests & Limits
  7. Graceful Shutdown
  8. Configuration Management
  9. Logging for Orchestration
  10. Service Discovery Patterns
  11. Container Anti-Patterns
  12. Docker Swarm vs Kubernetes
  13. Exercises
  14. Conclusion & Next Steps

From Single Host to Orchestration

Docker Compose gives you declarative multi-container management — but only on a single machine. When that machine runs out of CPU, memory, or disk, your application has nowhere to go. When it crashes, every service goes down. Production demands more.

What Orchestration Solves

Single Host vs Orchestrated Cluster
flowchart TB
    subgraph SingleHost["Single Host (Docker Compose)"]
        direction TB
        HOST["One Machine"]
        HOST --> C1["webapp"]
        HOST --> C2["database"]
        HOST --> C3["cache"]
        HOST -->|"Single point
of failure"| FAIL["❌ Host dies =
Everything dies"] end subgraph Orchestrated["Orchestrated Cluster (Kubernetes)"] direction TB SCHED["Scheduler"] SCHED --> N1["Node 1"] SCHED --> N2["Node 2"] SCHED --> N3["Node 3"] N1 --> C4["webapp-1"] N1 --> C5["cache-1"] N2 --> C6["webapp-2"] N2 --> C7["database"] N3 --> C8["webapp-3"] N3 --> C9["cache-2"] N2 -->|"Node 2 dies"| HEAL["✅ Scheduler moves
workloads to Node 1/3"] end

Container orchestrators solve five fundamental problems:

Problem Docker Alone With Orchestrator
SchedulingManual placement on one hostAuto-places containers on nodes with capacity
ScalingManual docker compose up --scaleAuto-scales based on CPU/memory/custom metrics
Self-healingrestart: always (same host only)Reschedules to healthy nodes, replaces failed pods
Service discoveryDocker DNS (single network)Cluster-wide DNS, load balancing, traffic routing
Rolling updatesStop old, start new (downtime)Zero-downtime rolling deployments with rollback
Key Insight: Orchestration doesn't just run containers differently — it requires containers to be designed differently. A container that works perfectly under docker compose up may fail repeatedly under Kubernetes if it wasn't built with orchestration principles in mind.

The Twelve-Factor App

The Twelve-Factor App methodology (Adam Wiggins, 2011) predates containers but perfectly describes container-native design. These principles ensure applications are portable, scalable, and orchestration-friendly.

The 12 Factors for Containers

# Factor Container Application Criticality
ICodebaseOne repo per service, one image per repoMedium
IIDependenciesAll deps in Dockerfile, no implicit host packagesHigh
IIIConfigEnvironment variables, ConfigMaps, not baked into imageCritical
IVBacking ServicesDatabases, caches as attached resources (URLs)Critical
VBuild, Release, RunSeparate build (CI), release (tag), run (orchestrator)High
VIProcessesStateless processes, share-nothing architectureCritical
VIIPort BindingSelf-contained, expose via port (no external webserver)High
VIIIConcurrencyScale horizontally (add replicas, not bigger containers)Critical
IXDisposabilityFast startup, graceful shutdown (SIGTERM handling)Critical
XDev/Prod ParitySame image in dev/staging/prod, config differsHigh
XILogsWrite to stdout/stderr, let platform aggregateCritical
XIIAdmin ProcessesOne-off tasks as separate containers (Jobs/CronJobs)Medium
Most Violated Factors: Factors III (Config), VI (Processes), IX (Disposability), and XI (Logs) are the most commonly violated in container deployments. If your containers bake configuration into images, store state locally, don't handle SIGTERM, or write logs to files — they will fail under orchestration.

Stateless vs Stateful Containers

The single most important design decision for orchestration is whether your container is stateless (can be killed and replaced at any time) or stateful (holds data that must survive restarts). Orchestrators strongly prefer stateless containers because they can be freely scheduled, scaled, and replaced.

Stateless Architecture Pattern
flowchart LR
    subgraph Stateless["Stateless Tier (Scale Freely)"]
        W1["webapp-1"]
        W2["webapp-2"]
        W3["webapp-3"]
    end

    subgraph External["External State (Managed)"]
        DB[(PostgreSQL
Database)] CACHE[(Redis
Session Store)] S3[(Object Storage
Files/Uploads)] end LB["Load Balancer"] --> W1 LB --> W2 LB --> W3 W1 --> DB W2 --> DB W3 --> DB W1 --> CACHE W2 --> CACHE W3 --> CACHE W1 --> S3 W2 --> S3 W3 --> S3
Aspect Stateless Container Stateful Container
Data persistenceNo local data — externalized to DB/cache/S3Requires persistent volumes
ScalingAdd/remove replicas freelyComplex (data migration, quorum)
ReplacementKill and recreate instantlyMust drain, migrate, then replace
SchedulingAny node in the clusterNode with attached volume
Recovery timeSeconds (fresh start)Minutes (volume reattach, replay)
ExamplesAPI servers, web frontends, workersDatabases, message queues, caches
Kubernetes typeDeploymentStatefulSet

Externalizing state — the key to making containers stateless:

# Anti-pattern: storing sessions in container memory
# If this container dies, all user sessions are lost
docker run -e SESSION_STORE=memory myapp:latest

# Pattern: externalize sessions to Redis
# Container can be killed/replaced without losing sessions
docker run \
  -e SESSION_STORE=redis \
  -e REDIS_URL=redis://redis-cluster:6379 \
  myapp:latest

# Anti-pattern: storing uploads in container filesystem
# Files are lost when container is replaced
docker run -v /app/uploads myapp:latest

# Pattern: externalize files to object storage
docker run \
  -e UPLOAD_BACKEND=s3 \
  -e S3_BUCKET=myapp-uploads \
  -e S3_REGION=us-east-1 \
  myapp:latest

Container Design Principles for Orchestration

Beyond statelessness, orchestration-ready containers follow specific design principles that enable the orchestrator to manage their lifecycle effectively.

Single Process per Container

Each container should run one primary process. This makes health monitoring, logging, scaling, and resource allocation straightforward.

# Anti-pattern: multiple processes in one container
# Dockerfile that runs nginx + php-fpm + cron in one container
CMD ["supervisord", "-c", "/etc/supervisor.conf"]
# Problems: Can't scale web and cron independently,
# can't monitor health of individual services,
# logs are mixed, failure of one affects all

# Pattern: separate containers for separate concerns
# Container 1: nginx (reverse proxy)
# Container 2: php-fpm (application)  
# Container 3: cron (scheduled tasks)
# Each can scale, restart, and be monitored independently

Immutable Infrastructure

# Anti-pattern: mutable container (SSH in and fix things)
docker exec -it production-web bash
apt-get update && apt-get install -y hotfix-package
# Container is now different from its image
# Next deployment reverts the fix
# No audit trail of what changed

# Pattern: immutable containers
# Fix the Dockerfile, build a new image, deploy
# Every container matches its image exactly
# Rollback = deploy previous image tag

Fast Startup

# Measure container startup time
time docker run --rm myapp:latest echo "ready"
# Target: under 5 seconds for web services
# Target: under 30 seconds for complex services (JVM, ML models)

# Strategies for fast startup:
# 1. Small images (Alpine, distroless) — less to pull/extract
# 2. Pre-compile/pre-build in image (don't compile at startup)
# 3. Lazy initialization (connect to DB on first request, not at boot)
# 4. Health probe start_period to handle warm-up

Health Probes

Orchestrators don't just check if a process is running — they use health probes to determine if a container is functioning correctly. Without probes, the orchestrator can't make intelligent decisions about routing traffic or restarting unhealthy instances.

Probe Type Purpose Failure Action When to Use
LivenessIs the process alive and not deadlocked?Kill and restart containerAlways (detect stuck processes)
ReadinessIs it ready to serve traffic?Remove from load balancer (don't kill)Services with warm-up or dependencies
StartupHas initial boot completed?Give more time (protect slow starters)JVM apps, ML model loading
// Express.js health endpoint implementation
const express = require('express');
const app = express();

let isReady = false;
let isHealthy = true;

// Simulate initialization (DB connection, cache warm-up)
async function initialize() {
    await connectToDatabase();
    await warmUpCache();
    isReady = true;
    console.log('Application ready to serve traffic');
}

// Liveness probe: Am I alive and not deadlocked?
// Should be LIGHTWEIGHT — no external dependencies
app.get('/healthz', (req, res) => {
    if (isHealthy) {
        res.status(200).json({ status: 'alive' });
    } else {
        res.status(503).json({ status: 'unhealthy' });
    }
});

// Readiness probe: Can I serve traffic right now?
// Check dependencies: DB connected? Cache available?
app.get('/readyz', async (req, res) => {
    if (!isReady) {
        return res.status(503).json({ status: 'initializing' });
    }
    
    try {
        await db.query('SELECT 1');  // Quick DB check
        res.status(200).json({ status: 'ready' });
    } catch (err) {
        // Temporarily not ready — don't kill, just stop traffic
        res.status(503).json({ status: 'not ready', error: err.message });
    }
});

initialize();
app.listen(3000);
# Kubernetes probe configuration (for reference)
# This is what Docker healthchecks translate to in K8s
apiVersion: v1
kind: Pod
spec:
  containers:
    - name: webapp
      image: myapp:latest
      ports:
        - containerPort: 3000
      
      livenessProbe:
        httpGet:
          path: /healthz
          port: 3000
        initialDelaySeconds: 5
        periodSeconds: 10
        failureThreshold: 3      # Kill after 3 failures
      
      readinessProbe:
        httpGet:
          path: /readyz
          port: 3000
        initialDelaySeconds: 5
        periodSeconds: 5
        failureThreshold: 2      # Remove from LB after 2 failures
      
      startupProbe:
        httpGet:
          path: /healthz
          port: 3000
        failureThreshold: 30     # Allow up to 30 * 10s = 5 min to start
        periodSeconds: 10
Critical Rule: Liveness probes must NEVER check external dependencies (database, cache, API). If your liveness probe fails because the database is down, the orchestrator kills your container — which doesn't fix the database. Use readiness probes for dependency checks.

Resource Requests & Limits

Orchestrators need to know how much CPU and memory your container requires to schedule it on an appropriate node. Without resource specifications, the scheduler can overcommit nodes, leading to OOM kills and CPU starvation.

# Docker resource flags (translate to Kubernetes resource spec)
# Memory limit: container is OOM-killed if it exceeds this
docker run --memory=256m myapp:latest

# CPU limit: container is throttled (not killed) when exceeding
docker run --cpus=1.0 myapp:latest

# Memory reservation (soft limit): scheduler hint
docker run --memory-reservation=128m --memory=256m myapp:latest

# Connection to cgroups (Part 3):
# These flags set cgroup limits in /sys/fs/cgroup/
# memory.max = --memory value
# cpu.max = --cpus value * 100000 (period)
# Kubernetes resource specification
# requests = guaranteed minimum (used for scheduling)
# limits = maximum allowed (enforced by cgroups)
apiVersion: v1
kind: Pod
spec:
  containers:
    - name: webapp
      image: myapp:latest
      resources:
        requests:
          memory: "128Mi"   # Scheduler ensures node has 128Mi available
          cpu: "250m"       # 250 millicores = 0.25 CPU cores
        limits:
          memory: "256Mi"   # OOM-killed if exceeds 256Mi
          cpu: "1000m"      # Throttled (not killed) above 1 CPU core

Quality of Service (QoS) Classes — determined by how you set requests and limits:

QoS Class Condition Eviction Priority Best For
Guaranteedrequests = limits (both CPU and memory)Last to be evictedCritical services (databases, payment)
Burstablerequests < limits (at least one set)Evicted after BestEffortMost production workloads
BestEffortNo requests or limits setFirst to be evictedBatch jobs, development only
Sizing Strategy: Start with generous limits, observe actual usage with monitoring (Part 20), then right-size. For memory: set requests to P95 usage and limits to P99. For CPU: set requests to average usage and limits to burst capacity. Use docker stats or cAdvisor to measure.

Graceful Shutdown & Signal Handling

When an orchestrator needs to stop a container (scaling down, rolling update, node drain), it sends SIGTERM and waits a grace period (default 30 seconds) before sending SIGKILL. Containers that handle SIGTERM properly finish in-flight requests and close connections cleanly. Those that don't cause dropped connections and data corruption.

Container Shutdown Sequence
sequenceDiagram
    participant K as Orchestrator
    participant C as Container (PID 1)
    participant LB as Load Balancer

    K->>LB: Remove container from service endpoints
    K->>C: Send SIGTERM
    Note over C: Grace period starts (30s default)
    C->>C: Stop accepting new connections
    C->>C: Finish in-flight requests
    C->>C: Close database connections
    C->>C: Flush buffers/caches
    C->>K: Exit code 0 (clean shutdown)
    Note over K: If still running after grace period:
    K->>C: Send SIGKILL (force kill)
                            
// Node.js: Proper SIGTERM handling
const http = require('http');

const server = http.createServer((req, res) => {
    // Normal request handling
    res.writeHead(200);
    res.end('Hello World');
});

server.listen(3000);

// Graceful shutdown on SIGTERM
process.on('SIGTERM', () => {
    console.log('SIGTERM received. Starting graceful shutdown...');
    
    // Stop accepting new connections
    server.close(() => {
        console.log('HTTP server closed. All connections drained.');
        
        // Close database connections
        db.end().then(() => {
            console.log('Database connections closed.');
            process.exit(0);  // Clean exit
        });
    });
    
    // Force exit after timeout (safety net)
    setTimeout(() => {
        console.error('Forced shutdown after timeout');
        process.exit(1);
    }, 25000);  // 25s < 30s grace period
});
#!/usr/bin/env python3
# Python: Proper SIGTERM handling with asyncio
import signal
import asyncio
from aiohttp import web

app = web.Application()
runner = None

async def handle(request):
    return web.Response(text="Hello World")

app.router.add_get('/', handle)

async def shutdown(signal_received, loop):
    """Graceful shutdown handler."""
    print(f'Received {signal_received.name}. Shutting down...')
    
    # Stop accepting new connections
    if runner:
        await runner.cleanup()
    
    # Cancel all running tasks
    tasks = [t for t in asyncio.all_tasks() if t is not asyncio.current_task()]
    for task in tasks:
        task.cancel()
    
    await asyncio.gather(*tasks, return_exceptions=True)
    loop.stop()

async def main():
    global runner
    loop = asyncio.get_event_loop()
    
    # Register signal handlers
    for sig in (signal.SIGTERM, signal.SIGINT):
        loop.add_signal_handler(sig, lambda s=sig: asyncio.create_task(shutdown(s, loop)))
    
    runner = web.AppRunner(app)
    await runner.setup()
    site = web.TCPSite(runner, '0.0.0.0', 8080)
    await site.start()
    print('Server started on port 8080')
    
    # Run forever until signal received
    await asyncio.Event().wait()

asyncio.run(main())

The PID 1 Problem

In Linux, PID 1 has special signal handling: it only receives signals it explicitly registers handlers for. If your application isn't PID 1 (wrapped in a shell script), SIGTERM may not reach it.

# BAD: Shell wraps your app — SIGTERM goes to shell, not app
CMD node server.js
# Docker actually runs: /bin/sh -c "node server.js"
# PID 1 = /bin/sh, PID 2 = node
# SIGTERM hits sh, which doesn't forward to node

# GOOD: Exec form — your app IS PID 1
CMD ["node", "server.js"]
# PID 1 = node server.js
# SIGTERM goes directly to your application

# ALTERNATIVE: Use tini as init process (handles signal forwarding)
RUN apk add --no-cache tini
ENTRYPOINT ["/sbin/tini", "--"]
CMD ["node", "server.js"]
# PID 1 = tini, PID 2 = node
# tini forwards SIGTERM to child processes correctly

Configuration Management

Factor III (Config) states: store config in the environment, not in code. For orchestration, this means the same image deploys to dev, staging, and production — only the configuration changes.

# Configuration hierarchy for orchestrated containers
# Priority (highest to lowest):
# 1. Secrets (sensitive: passwords, API keys, certificates)
# 2. Environment variables (non-sensitive runtime config)
# 3. ConfigMaps/config files (mounted as volumes)
# 4. Defaults baked into the application code

# Kubernetes ConfigMap example
apiVersion: v1
kind: ConfigMap
metadata:
  name: webapp-config
data:
  LOG_LEVEL: "info"
  MAX_CONNECTIONS: "100"
  FEATURE_FLAG_NEW_UI: "true"
  config.yaml: |
    server:
      port: 8080
      read_timeout: 30s
    cache:
      ttl: 300s
      max_size: 1000
# Why NOT to bake config into images:
# 1. Different environments need different values
#    dev: DATABASE_URL=postgres://localhost/dev
#    prod: DATABASE_URL=postgres://rds.amazonaws.com/prod

# 2. Config changes shouldn't require rebuilding
#    Changing a feature flag should be instant, not a 5-min CI pipeline

# 3. Secrets in images are visible to anyone with pull access
#    docker inspect reveals all ENV values set during build

# 4. Same image everywhere = confidence that code is identical
#    "It works in staging" actually means something

# Pattern: application reads config at startup
docker run \
  -e DATABASE_URL=postgres://prod-db:5432/myapp \
  -e REDIS_URL=redis://prod-cache:6379 \
  -e LOG_LEVEL=warn \
  -v /etc/app/config.yaml:/app/config.yaml:ro \
  myapp:latest   # Same image used in dev, staging, prod

Logging for Orchestration

In orchestrated environments, containers are ephemeral — they start, stop, move, and get replaced. Logs written to files inside containers are lost when the container is destroyed. The standard is to write to stdout/stderr and let the platform handle collection, aggregation, and storage.

# Anti-pattern: writing logs to files inside container
# These logs disappear when the container is killed
CMD ["node", "server.js", "--log-file=/var/log/app.log"]
# docker logs shows nothing, logs are trapped in the container

# Pattern: write to stdout/stderr
CMD ["node", "server.js"]
# docker logs shows everything
# Kubernetes automatically collects from stdout/stderr
# Log aggregators (ELK, Loki, CloudWatch) scrape container stdout
// Structured logging (JSON format) — machine-parseable
{"timestamp":"2026-05-14T10:30:00Z","level":"info","service":"webapp","msg":"Request processed","method":"GET","path":"/api/users","status":200,"duration_ms":45,"request_id":"abc-123"}
{"timestamp":"2026-05-14T10:30:01Z","level":"error","service":"webapp","msg":"Database connection failed","error":"connection refused","retry_count":3,"request_id":"def-456"}
{"timestamp":"2026-05-14T10:30:02Z","level":"warn","service":"webapp","msg":"Rate limit approaching","client_ip":"10.0.1.5","requests_per_minute":95,"limit":100}
// Structured logging implementation (Node.js with pino)
const pino = require('pino');

const logger = pino({
    level: process.env.LOG_LEVEL || 'info',
    // JSON output to stdout — orchestrator collects it
    formatters: {
        level: (label) => ({ level: label }),
    },
    base: {
        service: 'webapp',
        version: process.env.APP_VERSION || 'unknown',
        environment: process.env.NODE_ENV || 'development',
    },
});

// Usage
logger.info({ method: 'GET', path: '/users', duration_ms: 45 }, 'Request processed');
logger.error({ err, request_id: req.id }, 'Database query failed');
Why Structured Logging? In a cluster with hundreds of containers, you need to filter logs by service, request ID, error level, and time range. JSON logs enable powerful queries in aggregation platforms: "Show me all ERROR logs from the webapp service in the last hour with request_id=abc-123."

Service Discovery Patterns

When containers are scheduled across multiple nodes and can be started/stopped/moved at any time, hardcoded addresses are impossible. Service discovery enables containers to find each other dynamically.

DNS-Based Service Discovery (Kubernetes)
flowchart LR
    subgraph ClusterDNS["Cluster DNS (CoreDNS)"]
        DNS["webapp.default.svc.cluster.local
→ ClusterIP 10.96.0.50"] end subgraph Service["Kubernetes Service (Load Balancer)"] SVC["ClusterIP: 10.96.0.50
Port: 80"] end subgraph Pods["Backend Pods"] P1["webapp-abc
10.244.1.5:3000"] P2["webapp-def
10.244.2.8:3000"] P3["webapp-ghi
10.244.3.2:3000"] end CLIENT["Other Container"] -->|"GET http://webapp:80"| DNS DNS --> SVC SVC -->|round-robin| P1 SVC -->|round-robin| P2 SVC -->|round-robin| P3
Pattern How It Works Used By
DNS-basedService name resolves to virtual IP or pod IPsKubernetes Services, Docker Compose
Environment-basedInjected env vars with service addressesDocker links (deprecated), K8s env
Registry-basedServices register in Consul/etcd, clients queryHashiCorp Consul, Netflix Eureka
# In Kubernetes, every Service gets a DNS name:
# Format: ..svc.cluster.local

# From any pod in the same namespace:
curl http://webapp:80/api/users        # Short name works within namespace
curl http://webapp.default.svc.cluster.local:80/api/users  # Fully qualified

# Cross-namespace communication:
curl http://database.data-tier.svc.cluster.local:5432

# Headless service (returns pod IPs directly, not ClusterIP):
# Useful for stateful services where clients need specific pods
nslookup database-headless.default.svc.cluster.local
# Returns: 10.244.1.5, 10.244.2.8, 10.244.3.2 (individual pod IPs)

Container Anti-Patterns for Orchestration

These patterns work fine on a single Docker host but break catastrophically when containers are orchestrated across a cluster:

Anti-Pattern Why It Fails Correct Pattern
SSH into containersContainers are ephemeral; changes are lost on restartDebug with kubectl exec, fix in Dockerfile and redeploy
Manual scalingCan't react to load spikes fast enoughHorizontal Pod Autoscaler (HPA) based on metrics
Pets (named, irreplaceable)Can't be scheduled elsewhere or replacedCattle (numbered, disposable, identical)
Hardcoded IPs/hostsPods get new IPs on every restartService discovery via DNS names
Local file storageFiles lost when pod moves to another nodePersistentVolumes or object storage (S3)
In-memory sessionsLost on pod replacement; sticky sessions break scalingRedis/Memcached session store
Long startup timeSlow scaling, slow recovery from failuresOptimize image size, lazy init, startup probes
No health endpointsOrchestrator can't detect unhealthy instancesImplement /healthz and /readyz endpoints
Running as rootSecurity risk, violates Pod Security StandardsNon-root user, read-only filesystem
Ignoring SIGTERMConnections dropped during rolling updatesHandle SIGTERM, drain connections gracefully
Fat containersSlow pulls, waste cluster resourcesMulti-stage builds, distroless, <50MB images
Latest tagNon-reproducible deployments, no rollbackImmutable tags (git SHA, semver)
Mental Model
Pets vs Cattle

Pets are servers you name, care for, and nurse back to health when they're sick (e.g., "db-master-1" that's been running for 3 years). Cattle are servers that are numbered, identical, and replaced when they fail (e.g., "webapp-7f8d9-xk2p4" — if it's unhealthy, kill it and start a new one).

Orchestration requires the cattle mindset: any container can be killed at any time and a fresh replacement will take its place. If your container requires manual intervention to recover, it's a pet — and it will be a constant source of incidents in production.

pets-vs-cattle immutable-infrastructure disposability

Docker Swarm vs Kubernetes

Two primary orchestrators exist in the container ecosystem. Understanding their trade-offs helps you choose the right platform for your workloads.

Dimension Docker Swarm Kubernetes
ComplexitySimple — docker swarm init and doneComplex — multi-component control plane
Learning curveHours (if you know Docker)Weeks to months
ScalingHundreds of nodesThousands of nodes (tested at 5,000+)
Auto-scalingManual replica countHPA, VPA, Cluster Autoscaler
NetworkingOverlay network, ingress routing meshCNI plugins, Ingress controllers, service mesh
StorageBasic volume pluginsCSI drivers, dynamic provisioning, StatefulSets
ConfigurationDocker configs and secretsConfigMaps, Secrets, external operators
EcosystemLimited third-party toolsMassive ecosystem (Helm, Operators, service mesh)
CI/CD integrationBasic (docker stack deploy)GitOps (ArgoCD, Flux), Helm, Kustomize
Multi-tenancyLimitedNamespaces, RBAC, Network Policies
Managed offeringsFew (Docker Enterprise deprecated)EKS, AKS, GKE, and dozens more
Best forSmall teams, simple apps, Docker-native orgsAny scale, complex workloads, enterprise
# Docker Swarm: Initialize and deploy (simple)
docker swarm init
docker stack deploy -c compose.yaml myapp
docker service scale myapp_webapp=5

# Kubernetes: Deploy (more verbose but more powerful)
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
kubectl autoscale deployment webapp --min=2 --max=10 --cpu-percent=70
Recommendation: If you're a small team with straightforward web applications and want minimal operational overhead, Docker Swarm is a pragmatic choice. If you need auto-scaling, complex networking, a rich ecosystem, or you're operating at scale — Kubernetes is the industry standard. The good news: containers built with the principles in this article work well on both platforms.

Exercises

Exercise 1: Take an existing containerised application and make it stateless. Identify all local state (sessions, uploads, cache) and externalize it to appropriate backing services (Redis, S3, database). Verify by running 3 replicas behind a load balancer — each should serve any request correctly.
Exercise 2: Implement proper health endpoints (/healthz and /readyz) in a web application. The readiness probe should check database connectivity. Simulate a database failure and observe that the readiness probe fails while liveness remains healthy.
Exercise 3: Implement graceful SIGTERM handling in your preferred language. The application should: stop accepting new connections, finish in-flight requests (with a timeout), close database connections, then exit cleanly. Test with docker stop and verify no requests are dropped using a load testing tool.
Exercise 4: Configure resource requests and limits for a container. Run a stress test to exceed the memory limit and observe OOM behavior. Then set CPU limits and observe throttling with docker stats. Document how QoS class affects eviction priority.
Exercise 5: Audit an existing application against all 12 factors. Create a checklist scoring each factor as "compliant", "partial", or "non-compliant". Then fix the top 3 most critical violations (Config, Processes, Disposability).

Conclusion & Next Steps

Container orchestration readiness isn't about learning Kubernetes commands — it's about designing containers that thrive in a dynamic, distributed environment. The principles we've covered form a checklist for orchestration-ready containers:

  • Stateless design — externalize all state to backing services
  • Twelve-factor compliance — config via environment, logs to stdout, single-process, disposable
  • Health probes — liveness (am I alive?), readiness (can I serve?), startup (am I done booting?)
  • Resource declarations — requests for scheduling, limits for protection
  • Graceful shutdown — handle SIGTERM, drain connections, exit cleanly within grace period
  • Structured logging — JSON to stdout for aggregation and querying
  • Service discovery — use DNS names, never hardcode addresses
  • Immutable images — same image everywhere, config changes externally

With these principles in place, your containers are ready for Kubernetes, Docker Swarm, AWS ECS, Google Cloud Run, or any orchestration platform. The next step is making these running containers observable — so you know what's happening inside your distributed system.

Next in the Series

In Part 20: Container Monitoring & Observability, we'll instrument containers with metrics (Prometheus, cAdvisor), structured logging (ELK, Loki), and distributed tracing — turning opaque containers into transparent, debuggable systems.