Part 11: Containers & Orchestration

The Container Revolution

Containers represent the most significant shift in application packaging and deployment since virtual machines. By providing lightweight, portable, and reproducible environments, containers have fundamentally changed how we build, ship, and run software at scale.

                            
                            Key Insight: Containers package an application with all its dependencies into a standardized unit. Unlike VMs that virtualize hardware, containers virtualize the operating system — sharing the host kernel while maintaining process isolation. This makes them orders of magnitude lighter and faster to start.
                        

Containers vs Virtual Machines

Understanding the architectural difference between containers and VMs is fundamental to appreciating why containers have become the dominant deployment model:

VM vs Container Architecture

flowchart LR
    subgraph VM["Virtual Machine Stack"]
        direction TB
        HW1[Physical Hardware]
        HV[Hypervisor]
        G1[Guest OS 1]
        G2[Guest OS 2]
        A1[App A + Libs]
        A2[App B + Libs]
        HW1 --> HV
        HV --> G1
        HV --> G2
        G1 --> A1
        G2 --> A2
    end
    subgraph CT["Container Stack"]
        direction TB
        HW2[Physical Hardware]
        HOS[Host OS]
        CR[Container Runtime]
        C1[App A + Libs]
        C2[App B + Libs]
        C3[App C + Libs]
        HW2 --> HOS
        HOS --> CR
        CR --> C1
        CR --> C2
        CR --> C3
    end

Aspect	Virtual Machines	Containers
Isolation	Full hardware virtualization	OS-level process isolation
Size	GBs (includes full OS)	MBs (shares host kernel)
Startup Time	Minutes	Seconds (or less)
Density	10-20 per host	100s-1000s per host
Resource Overhead	High (dedicated OS per VM)	Minimal (shared kernel)
Portability	Hypervisor-dependent	Any Linux/Windows host
Security	Strong isolation (separate kernels)	Shared kernel (namespace isolation)
Use Case	Multi-tenant, different OS needs	Microservices, CI/CD, scaling

Container Ecosystem Overview

Container Ecosystem

flowchart TD
    DEV[Developer] --> DF[Dockerfile]
    DF --> BUILD[docker build]
    BUILD --> IMG[Container Image]
    IMG --> REG[Registry
Docker Hub / ECR / ACR / GCR]
    REG --> PULL[docker pull]
    PULL --> RUN[Container Runtime]
    RUN --> ORCH[Orchestrator
Kubernetes / Swarm / ECS]
    ORCH --> PROD[Production Workloads]

The container ecosystem spans development, building, distribution, and orchestration — each layer with purpose-built tools and standards (OCI specifications) ensuring interoperability.

Docker Fundamentals

Docker remains the most widely used container platform. Understanding its architecture and tooling is essential for working with containers in any environment.

Docker Architecture

flowchart LR
    CLI[Docker CLI
docker build/run/push] -->|REST API| DAEMON[Docker Daemon
dockerd]
    DAEMON --> IMAGES[Images]
    DAEMON --> CONTAINERS[Containers]
    DAEMON --> NETWORKS[Networks]
    DAEMON --> VOLUMES[Volumes]
    DAEMON -->|pull/push| REGISTRY[Container Registry]

Dockerfile Anatomy

A Dockerfile is a text file containing instructions to build a container image. Each instruction creates a layer in the image:

# syntax=docker/dockerfile:1

# Base image - always start with FROM
FROM node:20-alpine

# Set metadata
LABEL maintainer="dev@example.com"
LABEL version="1.0"

# Set working directory inside container
WORKDIR /app

# Copy dependency files first (layer caching optimization)
COPY package.json package-lock.json ./

# Install dependencies
RUN npm ci --only=production

# Copy application source code
COPY src/ ./src/
COPY public/ ./public/

# Create non-root user for security
RUN addgroup -g 1001 appuser && \
    adduser -u 1001 -G appuser -s /bin/sh -D appuser
USER appuser

# Document the port the app uses
EXPOSE 3000

# Health check
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
    CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1

# Define the command to run
CMD ["node", "src/server.js"]

                            
                            CMD vs ENTRYPOINT: CMD provides default arguments that can be overridden at runtime. ENTRYPOINT defines the executable that always runs. Combine them: ENTRYPOINT ["node"] with CMD ["server.js"] — allowing users to override just the script name.
                        

Building and Running Containers

# Build an image from a Dockerfile
docker build -t myapp:1.0 .

# Build with a specific Dockerfile and build context
docker build -f Dockerfile.prod -t myapp:1.0-prod ./app

# Run a container in detached mode with port mapping
docker run -d \
    --name my-web-app \
    -p 8080:3000 \
    -e NODE_ENV=production \
    -v app-data:/app/data \
    --restart unless-stopped \
    myapp:1.0

# View running containers
docker ps

# View container logs (follow mode)
docker logs -f my-web-app

# Execute a command inside a running container
docker exec -it my-web-app /bin/sh

# Stop and remove a container
docker stop my-web-app
docker rm my-web-app

# Remove all stopped containers
docker container prune

Container Lifecycle

# Full lifecycle commands
docker create --name app myapp:1.0    # Create (not started)
docker start app                       # Start a created/stopped container
docker pause app                       # Pause all processes
docker unpause app                     # Resume processes
docker stop app                        # Graceful shutdown (SIGTERM)
docker kill app                        # Force kill (SIGKILL)
docker restart app                     # Stop + start
docker rm app                          # Remove container
docker rm -f app                       # Force remove (even if running)

# Inspect container details
docker inspect app
docker stats app                       # Live resource usage
docker top app                         # Running processes

Multi-Stage Builds

Multi-stage builds dramatically reduce image size by separating build dependencies from the runtime image:

# Stage 1: Build
FROM golang:1.22-alpine AS builder

WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download

COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o /app/server ./cmd/server

# Stage 2: Runtime (minimal image)
FROM alpine:3.19

# Install ca-certificates for HTTPS calls
RUN apk --no-cache add ca-certificates

# Create non-root user
RUN adduser -D -u 1001 appuser
USER appuser

WORKDIR /app
COPY --from=builder /app/server .

EXPOSE 8080
ENTRYPOINT ["./server"]

                            
                            Size Reduction: A Go application built with multi-stage builds can go from ~800MB (with build tools) to ~15MB (alpine + binary). For Node.js, you can drop from ~1GB (with dev dependencies) to ~150MB.
                        

Dockerfile Best Practices

# ✅ GOOD: Use specific tags, not :latest
FROM node:20.11-alpine3.19

# ✅ GOOD: Combine RUN commands to reduce layers
RUN apt-get update && \
    apt-get install -y --no-install-recommends curl && \
    rm -rf /var/lib/apt/lists/*

# ✅ GOOD: Copy dependency files before source (caching)
COPY package.json package-lock.json ./
RUN npm ci --only=production
COPY . .

# ✅ GOOD: Use .dockerignore to exclude unnecessary files
# .dockerignore content:
# node_modules
# .git
# *.md
# .env
# tests/

# ✅ GOOD: Run as non-root
USER 1001

# ❌ BAD: Running as root (security risk)
# ❌ BAD: Using :latest tag (non-reproducible)
# ❌ BAD: COPY . . before installing dependencies (breaks cache)
# ❌ BAD: Storing secrets in image layers

Container Networking

Docker provides multiple network drivers to support different use cases, from isolated development environments to multi-host production deployments.

Docker Network Types

flowchart TD
    subgraph BRIDGE["Bridge Network (default)"]
        B1[Container A
172.17.0.2] <--> BR[docker0 bridge]
        B2[Container B
172.17.0.3] <--> BR
        BR <--> HOST1[Host eth0]
    end
    subgraph CUSTOM["User-Defined Bridge"]
        C1[Container C
app-net] <--> CBR[custom-bridge]
        C2[Container D
app-net] <--> CBR
        CBR <--> HOST2[Host eth0]
    end
    subgraph HOSTNET["Host Network"]
        H1[Container E
shares host network stack]
    end

Network Type	Use Case	Container-to-Container	External Access
bridge (default)	Standalone containers on same host	Via IP only	Port mapping (-p)
User-defined bridge	Application stacks needing DNS	Via container name (DNS)	Port mapping (-p)
host	Performance-critical, no NAT	Via localhost	Direct (no mapping needed)
overlay	Multi-host (Swarm/K8s)	Cross-host communication	Via routing mesh
none	Complete network isolation	Not possible	Not possible

Creating Networks and Connecting Containers

# Create a user-defined bridge network
docker network create --driver bridge app-network

# Run containers on the custom network
docker run -d --name api-server \
    --network app-network \
    -e DB_HOST=postgres-db \
    myapi:latest

docker run -d --name postgres-db \
    --network app-network \
    -e POSTGRES_PASSWORD=secret \
    postgres:16-alpine

# Containers can now communicate by name:
# api-server can reach postgres-db at "postgres-db:5432"

# List networks
docker network ls

# Inspect network (shows connected containers)
docker network inspect app-network

# Connect an existing container to a network
docker network connect app-network existing-container

# Disconnect a container from a network
docker network disconnect app-network existing-container

                            
                            DNS Resolution: User-defined bridge networks provide automatic DNS resolution between containers. The default bridge network does not — containers can only communicate via IP addresses. Always create custom networks for multi-container applications.
                        

Container Volumes & Storage

Containers are ephemeral by design — when a container is removed, its filesystem is gone. Volumes provide persistent storage that survives container lifecycle events.

Storage Type	Location	Managed By	Use Case
Named Volumes	/var/lib/docker/volumes/	Docker	Database data, application state
Bind Mounts	Any host path	User	Development (live code reload)
tmpfs Mounts	Host memory only	Kernel	Secrets, temp files (non-persistent)

Data Persistence Patterns

# Create a named volume
docker volume create postgres-data

# Run with named volume (recommended for production)
docker run -d --name db \
    -v postgres-data:/var/lib/postgresql/data \
    -e POSTGRES_PASSWORD=mysecret \
    postgres:16-alpine

# Bind mount for development (host directory mapped into container)
docker run -d --name dev-app \
    -v $(pwd)/src:/app/src \
    -v /app/node_modules \
    -p 3000:3000 \
    myapp:dev

# tmpfs mount (in-memory, not persisted)
docker run -d --name secure-app \
    --tmpfs /app/secrets:rw,size=64m \
    myapp:latest

# List volumes
docker volume ls

# Inspect a volume
docker volume inspect postgres-data

# Backup a volume
docker run --rm \
    -v postgres-data:/source:ro \
    -v $(pwd):/backup \
    alpine tar czf /backup/postgres-backup.tar.gz -C /source .

# Remove unused volumes
docker volume prune

Docker Compose

Docker Compose defines and runs multi-container applications using a declarative YAML file. It solves the problem of coordinating multiple containers that form a single application stack.

Complete 3-Tier Application

# docker-compose.yml - Complete 3-tier application
services:
  # Frontend - React application
  frontend:
    build:
      context: ./frontend
      dockerfile: Dockerfile
    ports:
      - "3000:80"
    depends_on:
      api:
        condition: service_healthy
    environment:
      - REACT_APP_API_URL=http://api:8080
    networks:
      - frontend-net

  # Backend API - Node.js
  api:
    build:
      context: ./api
      dockerfile: Dockerfile
    ports:
      - "8080:8080"
    environment:
      - NODE_ENV=production
      - DATABASE_URL=postgresql://appuser:secret@db:5432/myapp
      - REDIS_URL=redis://cache:6379
    depends_on:
      db:
        condition: service_healthy
      cache:
        condition: service_started
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s
    networks:
      - frontend-net
      - backend-net
    restart: unless-stopped

  # Database - PostgreSQL
  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_DB: myapp
      POSTGRES_USER: appuser
      POSTGRES_PASSWORD: secret
    volumes:
      - postgres-data:/var/lib/postgresql/data
      - ./init-scripts:/docker-entrypoint-initdb.d
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U appuser -d myapp"]
      interval: 5s
      timeout: 3s
      retries: 5
    networks:
      - backend-net
    restart: unless-stopped

  # Cache - Redis
  cache:
    image: redis:7-alpine
    command: redis-server --maxmemory 128mb --maxmemory-policy allkeys-lru
    volumes:
      - redis-data:/data
    networks:
      - backend-net
    restart: unless-stopped

volumes:
  postgres-data:
    driver: local
  redis-data:
    driver: local

networks:
  frontend-net:
    driver: bridge
  backend-net:
    driver: bridge

Essential Compose Commands

# Start all services (build if needed)
docker compose up -d --build

# View running services
docker compose ps

# View logs for all services (follow)
docker compose logs -f

# View logs for specific service
docker compose logs -f api

# Scale a service
docker compose up -d --scale api=3

# Execute command in a running service
docker compose exec api npm run migrate

# Stop all services (preserve volumes)
docker compose down

# Stop and remove volumes (DESTRUCTIVE)
docker compose down -v

# Rebuild a single service
docker compose build api
docker compose up -d api

# View resource usage
docker compose top

                            
                            Production Warning: Docker Compose is excellent for development and testing but is not recommended for production orchestration. For production, use Kubernetes, ECS, or another orchestrator that provides high availability, automatic failover, and rolling deployments.
                        

Container Registries

Container registries store and distribute container images. Choosing the right registry depends on your cloud provider, security requirements, and team workflow.

Registry	Provider	Free Tier	Scanning	Best For
Docker Hub	Docker	1 private repo	Basic	Public images, OSS
Amazon ECR	AWS	500 MB/month	Built-in	AWS workloads
Azure ACR	Microsoft	None (paid)	Microsoft Defender	Azure workloads
Google Artifact Registry	GCP	500 MB/month	Built-in	GCP workloads
GitHub Container Registry	GitHub	Public unlimited	Dependabot	GitHub Actions CI/CD

Image Tagging Strategies

# Tag with semantic version
docker tag myapp:latest registry.example.com/myapp:1.2.3
docker tag myapp:latest registry.example.com/myapp:1.2
docker tag myapp:latest registry.example.com/myapp:1

# Tag with git SHA (immutable, traceable)
GIT_SHA=$(git rev-parse --short HEAD)
docker tag myapp:latest registry.example.com/myapp:${GIT_SHA}

# Tag with build metadata
docker tag myapp:latest registry.example.com/myapp:1.2.3-build.456

# Push to registry
docker push registry.example.com/myapp:1.2.3
docker push registry.example.com/myapp:${GIT_SHA}

# Pull from registry
docker pull registry.example.com/myapp:1.2.3

Image Scanning for Vulnerabilities

# Scan with Docker Scout (built-in)
docker scout cves myapp:latest

# Scan with Trivy (open-source, comprehensive)
trivy image myapp:latest

# Scan with Trivy - fail on HIGH/CRITICAL
trivy image --severity HIGH,CRITICAL --exit-code 1 myapp:latest

# Scan in CI pipeline (example output)
# myapp:latest (alpine 3.19.1)
# Total: 2 (HIGH: 1, CRITICAL: 1)
# ┌───────────────┬────────────────┬──────────┬─────────────────┐
# │   Library     │ Vulnerability  │ Severity │  Fixed Version  │
# ├───────────────┼────────────────┼──────────┼─────────────────┤
# │ libssl3       │ CVE-2024-XXXX  │ CRITICAL │ 3.1.5-r0        │
# │ curl          │ CVE-2024-YYYY  │ HIGH     │ 8.5.0-r0        │
# └───────────────┴────────────────┴──────────┴─────────────────┘

Kubernetes Architecture

Kubernetes (K8s) is the industry-standard container orchestration platform. It automates deployment, scaling, and management of containerized applications across clusters of machines.

Kubernetes Cluster Architecture

flowchart TD
    subgraph CP["Control Plane"]
        API[API Server]
        ETCD[(etcd
cluster state)]
        SCHED[Scheduler]
        CM[Controller Manager]
        API <--> ETCD
        API <--> SCHED
        API <--> CM
    end
    subgraph W1["Worker Node 1"]
        KL1[kubelet]
        KP1[kube-proxy]
        CR1[Container Runtime]
        P1[Pod A]
        P2[Pod B]
        KL1 --> CR1
        CR1 --> P1
        CR1 --> P2
    end
    subgraph W2["Worker Node 2"]
        KL2[kubelet]
        KP2[kube-proxy]
        CR2[Container Runtime]
        P3[Pod C]
        P4[Pod D]
        KL2 --> CR2
        CR2 --> P3
        CR2 --> P4
    end
    API --> KL1
    API --> KL2

                            
                            Control Plane Components:

                            • API Server — Front door for all operations (REST API, kubectl)

                            • etcd — Distributed key-value store holding all cluster state

                            • Scheduler — Assigns pods to nodes based on resources and constraints

                            • Controller Manager — Runs control loops (ReplicaSet, Deployment, Node controllers)

Worker Node Components

                            
                            Worker Node Components:

                            • kubelet — Agent ensuring containers run in pods as specified

                            • kube-proxy — Network proxy implementing Service abstractions (iptables/IPVS)

                            • Container Runtime — containerd or CRI-O (Docker is deprecated as runtime)

kubectl Essentials

# Cluster info
kubectl cluster-info
kubectl get nodes -o wide

# Namespace operations
kubectl get namespaces
kubectl create namespace staging

# Get resources (pods, deployments, services)
kubectl get pods -n default
kubectl get deployments -o wide
kubectl get services --all-namespaces

# Describe a resource (detailed info + events)
kubectl describe pod my-pod-name
kubectl describe node worker-1

# Apply a manifest (create or update)
kubectl apply -f deployment.yaml
kubectl apply -f ./k8s/               # Apply all YAML in directory

# Delete resources
kubectl delete -f deployment.yaml
kubectl delete pod my-pod-name

# View logs
kubectl logs my-pod-name
kubectl logs -f my-pod-name --tail=100    # Follow with tail
kubectl logs my-pod-name -c sidecar       # Specific container

# Execute command in pod
kubectl exec -it my-pod-name -- /bin/sh

# Port forward (local debugging)
kubectl port-forward svc/my-service 8080:80

# Watch resources in real-time
kubectl get pods -w

Kubernetes Core Objects

Pods & Deployments

A Pod is the smallest deployable unit — one or more containers sharing network and storage. A Deployment manages pod replicas and rolling updates.

# deployment.yaml - Complete Deployment manifest
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-api
  namespace: production
  labels:
    app: web-api
    version: v1.2.3
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-api
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1          # Max pods above desired during update
      maxUnavailable: 0    # Zero downtime
  template:
    metadata:
      labels:
        app: web-api
        version: v1.2.3
    spec:
      containers:
        - name: api
          image: registry.example.com/web-api:1.2.3
          ports:
            - containerPort: 8080
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: db-credentials
                  key: url
            - name: LOG_LEVEL
              valueFrom:
                configMapKeyRef:
                  name: app-config
                  key: log-level
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 512Mi
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 15
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /ready
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5
      restartPolicy: Always

Services & Ingress

Kubernetes Service Networking

flowchart TD
    CLIENT[External Client] --> ING[Ingress Controller
nginx / ALB]
    ING -->|/api| SVC1[Service: api
ClusterIP]
    ING -->|/web| SVC2[Service: frontend
ClusterIP]
    SVC1 --> P1[Pod api-1]
    SVC1 --> P2[Pod api-2]
    SVC1 --> P3[Pod api-3]
    SVC2 --> P4[Pod web-1]
    SVC2 --> P5[Pod web-2]

# service.yaml - ClusterIP Service (internal)
apiVersion: v1
kind: Service
metadata:
  name: web-api
  namespace: production
spec:
  type: ClusterIP
  selector:
    app: web-api
  ports:
    - port: 80
      targetPort: 8080
      protocol: TCP

---
# service-lb.yaml - LoadBalancer Service (external)
apiVersion: v1
kind: Service
metadata:
  name: web-api-public
  namespace: production
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
spec:
  type: LoadBalancer
  selector:
    app: web-api
  ports:
    - port: 443
      targetPort: 8080
      protocol: TCP

---
# ingress.yaml - Ingress resource (path-based routing)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
  namespace: production
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - api.example.com
      secretName: api-tls
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /api
            pathType: Prefix
            backend:
              service:
                name: web-api
                port:
                  number: 80
          - path: /
            pathType: Prefix
            backend:
              service:
                name: frontend
                port:
                  number: 80

ConfigMaps & Secrets

# configmap.yaml - Non-sensitive configuration
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
  namespace: production
data:
  log-level: "info"
  max-connections: "100"
  feature-flags: |
    {
      "new-ui": true,
      "beta-api": false
    }

---
# secret.yaml - Sensitive data (base64 encoded)
apiVersion: v1
kind: Secret
metadata:
  name: db-credentials
  namespace: production
type: Opaque
data:
  url: cG9zdGdyZXNxbDovL3VzZXI6cGFzc0BkYjo1NDMyL215YXBw    # base64 encoded
  password: c3VwZXJzZWNyZXQ=                                    # base64 encoded

# Create secret from command line (easier than manual base64)
kubectl create secret generic db-credentials \
    --from-literal=url='postgresql://user:pass@db:5432/myapp' \
    --from-literal=password='supersecret' \
    -n production

# Create configmap from file
kubectl create configmap nginx-config \
    --from-file=nginx.conf \
    -n production

Managed Kubernetes Services

Managed Kubernetes services abstract away control plane management, letting teams focus on deploying workloads rather than maintaining cluster infrastructure.

Feature	AWS EKS	Azure AKS	GCP GKE
Control Plane Cost	$0.10/hr (~$73/mo)	Free	Free (Autopilot) / $0.10/hr
Node Types	EC2, Fargate (serverless)	VMs, Virtual Nodes (ACI)	VMs, Autopilot (serverless)
Auto-scaling	Cluster Autoscaler, Karpenter	Cluster Autoscaler, KEDA	Node Auto-Provisioning
Networking	VPC CNI, Calico	Azure CNI, Kubenet	VPC-native, Dataplane V2
Service Mesh	App Mesh, Istio	Istio, Open Service Mesh	Anthos Service Mesh
Registry	ECR	ACR	Artifact Registry
Max Nodes	5,000	5,000	15,000
GPU Support	Yes (P4, A100)	Yes (T4, A100)	Yes (T4, A100, H100)

Terraform Deployment — AWS EKS

# providers.tf
terraform {
  required_version = ">= 1.5"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "us-east-1"
}

# eks-cluster.tf
module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.0"

  cluster_name    = "production-cluster"
  cluster_version = "1.29"

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  cluster_endpoint_public_access = true

  eks_managed_node_groups = {
    general = {
      instance_types = ["m6i.large"]
      min_size       = 2
      max_size       = 10
      desired_size   = 3

      labels = {
        workload-type = "general"
      }
    }

    spot = {
      instance_types = ["m6i.large", "m5.large", "m5a.large"]
      capacity_type  = "SPOT"
      min_size       = 0
      max_size       = 20
      desired_size   = 2

      labels = {
        workload-type = "batch"
      }

      taints = [{
        key    = "spot"
        value  = "true"
        effect = "NO_SCHEDULE"
      }]
    }
  }

  tags = {
    Environment = "production"
    ManagedBy   = "terraform"
  }
}

Terraform — Azure AKS

# aks-cluster.tf
resource "azurerm_kubernetes_cluster" "main" {
  name                = "production-aks"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  dns_prefix          = "prod-aks"
  kubernetes_version  = "1.29"

  default_node_pool {
    name                = "system"
    vm_size             = "Standard_D4s_v3"
    node_count          = 3
    min_count           = 2
    max_count           = 10
    enable_auto_scaling = true
    os_disk_size_gb     = 100
    vnet_subnet_id      = azurerm_subnet.aks.id
  }

  identity {
    type = "SystemAssigned"
  }

  network_profile {
    network_plugin    = "azure"
    load_balancer_sku = "standard"
    service_cidr      = "10.0.0.0/16"
    dns_service_ip    = "10.0.0.10"
  }

  tags = {
    Environment = "production"
    ManagedBy   = "terraform"
  }
}

resource "azurerm_kubernetes_cluster_node_pool" "worker" {
  name                  = "worker"
  kubernetes_cluster_id = azurerm_kubernetes_cluster.main.id
  vm_size               = "Standard_D8s_v3"
  min_count             = 1
  max_count             = 20
  enable_auto_scaling   = true
  os_disk_size_gb       = 200

  node_labels = {
    "workload-type" = "application"
  }
}

Production Patterns

Horizontal Pod Autoscaler (HPA)

# hpa.yaml - Auto-scale based on CPU and custom metrics
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-api-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-api
  minReplicas: 3
  maxReplicas: 50
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
        - type: Percent
          value: 100
          periodSeconds: 30
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60

Liveness & Readiness Probes

# Comprehensive probe configuration
spec:
  containers:
    - name: api
      image: myapp:latest
      # Liveness: Is the container alive? Restart if not
      livenessProbe:
        httpGet:
          path: /healthz
          port: 8080
        initialDelaySeconds: 30    # Wait for app startup
        periodSeconds: 10          # Check every 10s
        timeoutSeconds: 3          # Timeout per check
        failureThreshold: 3        # 3 failures = restart

      # Readiness: Can it serve traffic? Remove from LB if not
      readinessProbe:
        httpGet:
          path: /ready
          port: 8080
        initialDelaySeconds: 5
        periodSeconds: 5
        timeoutSeconds: 2
        failureThreshold: 3

      # Startup: Is it still starting? Protect slow-starting containers
      startupProbe:
        httpGet:
          path: /healthz
          port: 8080
        failureThreshold: 30       # 30 * 10s = 5 min max startup
        periodSeconds: 10

                            
                            Probe Best Practices:

                            • Liveness — Detects deadlocks. Keep simple (don't check dependencies).

                            • Readiness — Checks dependencies (DB, cache). Removes pod from service load balancer.

                            • Startup — Use for slow-starting apps to prevent premature liveness kills.

Rolling Updates and Rollbacks

# Update deployment image (triggers rolling update)
kubectl set image deployment/web-api \
    api=registry.example.com/web-api:1.3.0 \
    -n production

# Watch rollout status
kubectl rollout status deployment/web-api -n production

# View rollout history
kubectl rollout history deployment/web-api -n production

# Rollback to previous version
kubectl rollout undo deployment/web-api -n production

# Rollback to specific revision
kubectl rollout undo deployment/web-api --to-revision=3 -n production

# Pause/resume rollout (for canary testing)
kubectl rollout pause deployment/web-api -n production
kubectl rollout resume deployment/web-api -n production

Helm Charts & GitOps

# Install a Helm chart
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update

# Install nginx ingress controller
helm install ingress-nginx ingress-nginx/ingress-nginx \
    --namespace ingress-nginx \
    --create-namespace \
    --set controller.replicaCount=2

# Install with custom values file
helm install my-app ./charts/my-app \
    -f values-production.yaml \
    --namespace production

# Upgrade an existing release
helm upgrade my-app ./charts/my-app \
    -f values-production.yaml \
    --namespace production

# List releases
helm list --all-namespaces

# Rollback a release
helm rollback my-app 1 --namespace production

# Chart.yaml - Helm chart metadata
apiVersion: v2
name: web-api
description: Production web API deployment
version: 1.2.3
appVersion: "1.2.3"

# values.yaml - Default values
replicaCount: 3

image:
  repository: registry.example.com/web-api
  tag: "1.2.3"
  pullPolicy: IfNotPresent

service:
  type: ClusterIP
  port: 80

ingress:
  enabled: true
  hosts:
    - host: api.example.com
      paths:
        - path: /
          pathType: Prefix

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 50
  targetCPUUtilization: 70

                            
                            GitOps with ArgoCD: GitOps uses Git as the single source of truth for declarative infrastructure. Tools like ArgoCD and Flux continuously reconcile cluster state with Git repositories — any drift is automatically corrected, and all changes are traceable through git history.
                        

Hands-On Exercises

Exercise 1 Difficulty: Beginner

Build and Run a Multi-Stage Docker Image

Create a Node.js application with a multi-stage Dockerfile that separates build and runtime dependencies.

Create a simple Express.js API with a /health endpoint
Write a multi-stage Dockerfile: Stage 1 installs all deps + builds; Stage 2 copies only production artifacts
Build the image and compare sizes: docker images
Run the container with port mapping and verify the health endpoint
Add a .dockerignore file and rebuild — note the smaller build context

Success Criteria: Final image is under 150MB. Health check passes. Container runs as non-root user.

Docker Multi-stage Best Practices

Exercise 2 Difficulty: Intermediate

Deploy a 3-Tier Application with Docker Compose

Build a complete application stack with frontend, API, database, and cache using Docker Compose.

Create a docker-compose.yml with 4 services: React frontend, Node.js API, PostgreSQL, Redis
Configure health checks for the API and database services
Use depends_on with conditions to enforce startup order
Create named volumes for database persistence
Use separate networks: frontend-net (frontend ↔ API) and backend-net (API ↔ DB/Redis)
Test with docker compose up -d and verify all services are healthy

Success Criteria: All 4 services start in correct order. Database survives docker compose down && docker compose up. Frontend cannot directly access database.

Docker Compose Networking Multi-container

Exercise 3 Difficulty: Intermediate

Deploy to Kubernetes with kubectl

Deploy a containerized application to Kubernetes with proper production patterns.

Create a Deployment manifest with 3 replicas, resource limits, and rolling update strategy
Add liveness, readiness, and startup probes
Create a ClusterIP Service and an Ingress resource for external access
Store configuration in a ConfigMap and credentials in a Secret
Apply all manifests: kubectl apply -f k8s/
Test rolling update: change image tag and watch rollout
Practice rollback: kubectl rollout undo

Success Criteria: Zero-downtime rolling update. Rollback completes in under 30s. Pod restarts when liveness fails.

Kubernetes Deployments Services

Exercise 4 Difficulty: Advanced

Create Terraform for a Managed Kubernetes Cluster

Provision a production-ready managed Kubernetes cluster using Terraform.

Choose a cloud provider (EKS, AKS, or GKE)
Write Terraform for: VPC/networking, cluster control plane, managed node group (general purpose), spot/preemptible node pool (cost optimization)
Configure cluster autoscaler with min/max node counts
Enable RBAC and integrate with cloud IAM
Output kubeconfig and verify cluster access: kubectl get nodes
Deploy a sample workload using the Kubernetes Terraform provider

Success Criteria: terraform apply creates a functional cluster. Nodes auto-scale when load increases. Spot nodes have appropriate taints.

Terraform IaC Managed K8s

Conclusion & Coming Next

Containers and Kubernetes have become the foundation of modern application deployment. In this article, we covered the full journey from Docker fundamentals — images, networking, volumes, and Compose — through Kubernetes orchestration with its declarative object model, managed services, and production patterns like autoscaling, health probes, and Helm packaging.

                            
                            Key Takeaways:

                            • Containers provide lightweight, portable, reproducible application environments

                            • Docker is the foundation: master Dockerfiles, networking, volumes, and Compose

                            • Kubernetes orchestrates containers at scale with declarative desired-state management

                            • Managed services (EKS/AKS/GKE) eliminate control plane operations overhead

                            • Production requires proper probes, resource management, autoscaling, and GitOps workflows

Next in the Series

In Part 12: CI/CD Pipelines for Infrastructure, we explore GitHub Actions, GitLab CI, and Jenkins pipelines purpose-built for infrastructure automation. Learn how to lint Terraform, run security scans, deploy with approval gates, and implement full GitOps workflows for infrastructure changes.

Previous Part 10: Infrastructure Security Next Part 12: CI/CD Pipelines

Cookie Consent