CRDs & Operators - Part 12

Extending the Kubernetes API

Why Extend?

Kubernetes ships with built-in resources (Pods, Deployments, Services), but real-world infrastructure needs domain-specific concepts. You need to manage databases, message queues, ML training jobs, certificates, DNS records — things Kubernetes doesn't know about natively.

                            
                            The Vision: What if you could manage a PostgreSQL cluster the same way you manage a Deployment? Write a YAML manifest declaring "I want a 3-node PostgreSQL cluster with streaming replication, automated failover, and daily backups" — and Kubernetes makes it so. That's what CRDs and Operators enable.
                        

# Instead of manual database operations, declare what you want:
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: production-db
spec:
  instances: 3
  postgresql:
    parameters:
      shared_buffers: "512MB"
      max_connections: "200"
  storage:
    size: 100Gi
    storageClass: fast-ssd
  backup:
    barmanObjectStore:
      destinationPath: s3://backups/production-db/
    retentionPolicy: "30d"
  monitoring:
    enablePodMonitor: true

# The Operator handles:
# ✓ Provisioning 3 PostgreSQL instances
# ✓ Configuring streaming replication
# ✓ Automated failover if primary fails
# ✓ Daily backups to S3
# ✓ Monitoring integration
# ✓ Rolling upgrades
# ✓ Connection pooling

Extension Mechanisms

Mechanism	Purpose	Complexity	Use Case
CRD + Controller	New resource types with reconciliation	Medium	Operators, platform APIs
Admission Webhooks	Validate/mutate resources on create/update	Low	Policy enforcement, defaults
API Aggregation	Full custom API server	High	metrics-server, custom-metrics
Scheduler Extenders	Custom scheduling logic	Medium	GPU scheduling, data locality
kubectl Plugins	Extend kubectl CLI	Low	Developer experience

Custom Resource Definitions

CRD Anatomy

A CRD teaches the Kubernetes API Server about a new resource type. Once applied, you can create, read, update, and delete instances of your custom resource just like any built-in resource:

# Step 1: Define the CRD (teaches K8s about "Database" resources)
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: databases.mycompany.io    # plural.group
spec:
  group: mycompany.io             # API group
  names:
    plural: databases             # kubectl get databases
    singular: database            # kubectl get database my-db
    kind: Database                # YAML kind field
    shortNames:
    - db                          # kubectl get db
  scope: Namespaced               # Namespaced or Cluster
  versions:
  - name: v1alpha1
    served: true                  # API serves this version
    storage: true                 # Stored in etcd in this version
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            required: ["engine", "version", "replicas"]
            properties:
              engine:
                type: string
                enum: ["postgres", "mysql", "mongodb"]
              version:
                type: string
              replicas:
                type: integer
                minimum: 1
                maximum: 7
              storage:
                type: object
                properties:
                  size:
                    type: string
                    pattern: "^[0-9]+(Gi|Ti)$"
                  storageClass:
                    type: string
          status:
            type: object
            properties:
              phase:
                type: string
              readyReplicas:
                type: integer
              endpoint:
                type: string
    subresources:
      status: {}                  # Enable /status subresource
    additionalPrinterColumns:
    - name: Engine
      type: string
      jsonPath: .spec.engine
    - name: Version
      type: string
      jsonPath: .spec.version
    - name: Replicas
      type: integer
      jsonPath: .spec.replicas
    - name: Phase
      type: string
      jsonPath: .status.phase
    - name: Age
      type: date
      jsonPath: .metadata.creationTimestamp

# Step 2: Apply the CRD
kubectl apply -f database-crd.yaml

# Step 3: Create instances of the custom resource
kubectl apply -f my-database.yaml

# Step 4: Interact with it like any K8s resource:
kubectl get databases
# NAME          ENGINE    VERSION   REPLICAS   PHASE   AGE
# production    postgres  16        3          Ready   2d
# staging       postgres  16        1          Ready   5h
# analytics     mongodb   7.0       3          Ready   1d

kubectl get db production -o yaml
kubectl describe db production
kubectl delete db staging

# Step 3: Custom Resource instance
apiVersion: mycompany.io/v1alpha1
kind: Database
metadata:
  name: production
  namespace: default
spec:
  engine: postgres
  version: "16"
  replicas: 3
  storage:
    size: "100Gi"
    storageClass: fast-ssd

Schema Validation

OpenAPI v3 schemas in CRDs provide automatic validation — the API Server rejects invalid resources before they're stored:

# Schema validation catches errors at admission time:
spec:
  engine: "redis"    # ❌ Rejected: not in enum ["postgres","mysql","mongodb"]
  replicas: 10       # ❌ Rejected: maximum is 7
  storage:
    size: "100MB"    # ❌ Rejected: doesn't match pattern "^[0-9]+(Gi|Ti)$"

# Validation types available:
# - type: string/integer/boolean/object/array
# - enum: ["a", "b", "c"]
# - minimum/maximum (numbers)
# - minLength/maxLength (strings)
# - pattern (regex)
# - required: ["field1", "field2"]
# - default: "value" (set if not provided)
# - x-kubernetes-validations (CEL expressions for complex rules)

# Advanced: CEL validation rules (K8s 1.25+)
schema:
  openAPIV3Schema:
    type: object
    properties:
      spec:
        type: object
        x-kubernetes-validations:
        - rule: "self.replicas >= self.minAvailable"
          message: "replicas must be >= minAvailable"
        - rule: "self.engine == 'postgres' || self.version != '16'"
          message: "version 16 is only available for postgres"
        properties:
          replicas:
            type: integer
          minAvailable:
            type: integer
          engine:
            type: string
          version:
            type: string

Versions & Conversion

As your CRD evolves, you'll need multiple versions. Kubernetes supports serving multiple versions simultaneously with conversion webhooks to translate between them:

# Multi-version CRD with conversion:
spec:
  group: mycompany.io
  versions:
  - name: v1alpha1
    served: true
    storage: false     # Old version, still served but not stored
  - name: v1beta1
    served: true
    storage: false
  - name: v1
    served: true
    storage: true      # New version, stored in etcd
  conversion:
    strategy: Webhook
    webhook:
      conversionReviewVersions: ["v1"]
      clientConfig:
        service:
          namespace: system
          name: database-operator-webhook
          path: /convert

Printer Columns & Status

# additionalPrinterColumns control what "kubectl get" shows:
kubectl get databases
# NAME          ENGINE    VERSION   REPLICAS   PHASE   AGE
# production    postgres  16        3          Ready   2d

# The /status subresource separates spec (user intent) from status (system state):
# - Users update .spec (desired state)
# - Controllers update .status (actual state)
# - RBAC can grant different permissions to each

# Update status (controller code):
kubectl patch database production --type=merge --subresource=status \
  -p '{"status":{"phase":"Ready","readyReplicas":3,"endpoint":"production.db.svc:5432"}}'

The Operator Pattern

What Is an Operator?

                            
                            Definition: An Operator is a Kubernetes controller that uses Custom Resources to manage applications and their components. It encodes the operational knowledge of a human operator (DBA, SRE) into software — automating tasks like provisioning, scaling, backup, failover, and upgrades.
                        

The Operator Pattern

flowchart TD
    subgraph User
        CR[Custom Resource
Desired State YAML]
    end
    subgraph K8s API
        API[API Server + etcd]
    end
    subgraph Operator Pod
        CTRL[Controller
Reconcile Loop]
        KNOW[Domain Knowledge
How to manage DB]
    end
    subgraph Managed Resources
        STS[StatefulSet]
        SVC[Services]
        CM[ConfigMaps]
        SEC[Secrets]
        PDB[PodDisruptionBudget]
        MON[ServiceMonitor]
    end
    
    CR --> API
    API --> CTRL
    CTRL --> KNOW
    KNOW --> STS
    KNOW --> SVC
    KNOW --> CM
    KNOW --> SEC
    KNOW --> PDB
    KNOW --> MON

The key insight: an Operator encodes human operational expertise as code. Instead of a runbook that says "if the primary database fails, promote the replica with the least replication lag," the Operator's reconcile loop does this automatically.

Operator vs Controller

Aspect	Controller	Operator
Scope	Manages built-in resources	Manages CRDs (custom domain)
Knowledge	Generic (replicas, pods)	Domain-specific (backup, failover, replication)
Complexity	Stateless reconciliation	Stateful lifecycle management
Examples	Deployment controller, ReplicaSet controller	Postgres Operator, Kafka Operator, Cert-Manager
Day 2 Ops	Basic (restart, scale)	Full lifecycle (backup, restore, upgrade, failover)

Operator Maturity Model

The Operator Maturity Model (from OperatorHub.io) defines five levels of sophistication:

Level	Name	Capabilities	Example
1	Basic Install	Automated provisioning	Create DB from CR
2	Seamless Upgrades	Version upgrades, patch management	Rolling upgrade Postgres 15→16
3	Full Lifecycle	Backup, restore, failure recovery	Automated backup + point-in-time restore
4	Deep Insights	Metrics, alerts, log integration	Custom Prometheus metrics + Grafana dashboards
5	Auto Pilot	Auto-scaling, auto-tuning, anomaly detection	Auto-scale replicas based on query load

Building Operators

Kubebuilder

Kubebuilder is the official framework for building Kubernetes operators in Go. It generates project scaffolding, CRDs, controllers, webhooks, and RBAC — letting you focus on business logic:

# Initialize a new operator project:
mkdir database-operator && cd database-operator
kubebuilder init --domain mycompany.io --repo github.com/mycompany/database-operator

# Create a new API (CRD + Controller):
kubebuilder create api --group db --version v1alpha1 --kind Database
# Create Resource [y/n]: y
# Create Controller [y/n]: y

# Generated project structure:
# ├── api/v1alpha1/
# │   ├── database_types.go        ← CRD Go types (spec, status)
# │   └── zz_generated_deepcopy.go ← Auto-generated
# ├── internal/controller/
# │   └── database_controller.go   ← Reconcile logic (YOUR CODE)
# ├── config/
# │   ├── crd/                     ← Generated CRD YAML
# │   ├── rbac/                    ← RBAC manifests
# │   └── manager/                 ← Deployment manifest
# ├── cmd/main.go                  ← Entry point
# ├── Dockerfile                   ← Container image
# └── Makefile                     ← Build commands

# Generate CRD manifests:
make manifests

# Run locally (for development):
make run

# Build and deploy:
make docker-build docker-push IMG=myregistry/database-operator:v0.1.0
make deploy IMG=myregistry/database-operator:v0.1.0

Operator SDK

Operator SDK (from Red Hat) wraps Kubebuilder with additional features and supports multiple languages:

Framework	Language	Best For	Maturity
Kubebuilder (Go)	Go	Production operators, performance	Most mature
Operator SDK (Go)	Go	Kubebuilder + OLM integration	Most mature
Operator SDK (Ansible)	Ansible	Teams with Ansible expertise	Stable
Operator SDK (Helm)	Helm charts	Simple install/upgrade operators	Stable
KUDO	Declarative YAML	No-code operator definitions	Archived
Metacontroller	Any (webhooks)	Simple controllers in any language	Active

The Reconcile Function

The reconcile function is the heart of every operator. It runs whenever the custom resource changes and must bring the system from its current state to the desired state:

# Reconcile pseudocode (Go-like):

# func (r *DatabaseReconciler) Reconcile(ctx, req) (Result, error) {
#
#   // 1. Fetch the custom resource
#   database := &v1alpha1.Database{}
#   err := r.Get(ctx, req.NamespacedName, database)
#   if err != nil {
#       if errors.IsNotFound(err) {
#           return Result{}, nil  // Resource deleted, nothing to do
#       }
#       return Result{}, err      // Requeue on error
#   }
#
#   // 2. Check if StatefulSet exists, create if not
#   sts := &appsv1.StatefulSet{}
#   err = r.Get(ctx, types.NamespacedName{Name: database.Name, Namespace: database.Namespace}, sts)
#   if errors.IsNotFound(err) {
#       sts = r.buildStatefulSet(database)
#       err = r.Create(ctx, sts)
#       // Set owner reference so STS is garbage collected with CR
#       controllerutil.SetControllerReference(database, sts, r.Scheme)
#       return Result{RequeueAfter: 10 * time.Second}, nil
#   }
#
#   // 3. Ensure desired state matches actual state
#   if *sts.Spec.Replicas != database.Spec.Replicas {
#       sts.Spec.Replicas = &database.Spec.Replicas
#       err = r.Update(ctx, sts)
#       return Result{RequeueAfter: 30 * time.Second}, nil
#   }
#
#   // 4. Update status
#   database.Status.Phase = "Ready"
#   database.Status.ReadyReplicas = sts.Status.ReadyReplicas
#   database.Status.Endpoint = fmt.Sprintf("%s.%s.svc:5432", database.Name, database.Namespace)
#   err = r.Status().Update(ctx, database)
#
#   return Result{}, nil  // Reconciliation complete
# }

Reconcile Loop Flow

flowchart TD
    A[Event: CR Changed] --> B[Fetch CR from API Server]
    B --> C{CR exists?}
    C -->|No| D[Cleanup owned resources
Return]
    C -->|Yes| E[Check owned StatefulSet]
    E --> F{STS exists?}
    F -->|No| G[Create StatefulSet
Set owner reference]
    F -->|Yes| H{Spec matches?}
    H -->|No| I[Update StatefulSet]
    H -->|Yes| J[Check Service]
    J --> K{Service exists?}
    K -->|No| L[Create Service]
    K -->|Yes| M[Update CR Status]
    G --> N[Requeue after 10s]
    I --> N
    L --> N
    M --> O[Done — wait for next event]

Status & Conditions

Operators should report detailed status using the Kubernetes conditions pattern — standard fields that tools like kubectl understand:

# Status with conditions (standard pattern):
status:
  phase: Running
  readyReplicas: 3
  endpoint: production.default.svc.cluster.local:5432
  conditions:
  - type: Ready
    status: "True"
    lastTransitionTime: "2026-05-14T10:30:00Z"
    reason: AllReplicasReady
    message: "3/3 replicas are ready and accepting connections"
  - type: BackupReady
    status: "True"
    lastTransitionTime: "2026-05-14T02:00:00Z"
    reason: BackupCompleted
    message: "Last backup: 2026-05-14T02:00:00Z (30 retained)"
  - type: ReplicationHealthy
    status: "True"
    lastTransitionTime: "2026-05-14T10:29:55Z"
    reason: ReplicationLagNormal
    message: "Max replication lag: 0.2s (threshold: 30s)"

Production Operators

PostgreSQL Operator (CloudNativePG)

# CloudNativePG: production PostgreSQL on Kubernetes
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: production-db
spec:
  instances: 3                    # 1 primary + 2 replicas
  primaryUpdateStrategy: unsupervised
  
  postgresql:
    parameters:
      shared_buffers: "1GB"
      effective_cache_size: "3GB"
      max_connections: "300"
      work_mem: "16MB"
  
  storage:
    size: 200Gi
    storageClass: premium-ssd
  
  backup:
    barmanObjectStore:
      destinationPath: s3://backups/production/
      s3Credentials:
        accessKeyId:
          name: s3-creds
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: s3-creds
          key: SECRET_ACCESS_KEY
    retentionPolicy: "30d"
  
  monitoring:
    enablePodMonitor: true
    customQueriesConfigMap:
    - name: custom-queries
      key: queries

# What the operator does automatically:
# ✓ Provisions 3-node HA cluster (1 primary, 2 read replicas)
# ✓ Configures streaming replication
# ✓ Detects primary failure → promotes healthiest replica (~5s)
# ✓ Continuous WAL archiving to S3
# ✓ Point-in-time recovery capability
# ✓ Rolling upgrades (minor version)
# ✓ Connection pooling via PgBouncer sidecar
# ✓ Prometheus metrics + ServiceMonitor

Kafka (Strimzi) Operator

# Strimzi: Apache Kafka on Kubernetes
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: production-kafka
spec:
  kafka:
    version: "3.7.0"
    replicas: 3
    listeners:
    - name: plain
      port: 9092
      type: internal
      tls: false
    - name: tls
      port: 9093
      type: internal
      tls: true
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      default.replication.factor: 3
      min.insync.replicas: 2
    storage:
      type: jbod
      volumes:
      - id: 0
        type: persistent-claim
        size: 500Gi
        class: fast-ssd
  zookeeper:
    replicas: 3
    storage:
      type: persistent-claim
      size: 50Gi
  entityOperator:
    topicOperator: {}
    userOperator: {}

# Additional CRDs the operator provides:
# - KafkaTopic: manage topics declaratively
# - KafkaUser: manage ACLs and authentication
# - KafkaConnect: managed Kafka Connect clusters
# - KafkaMirrorMaker2: cross-cluster replication

Prometheus Operator

# Prometheus Operator: monitoring as code
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: production
spec:
  replicas: 2
  retention: 30d
  storage:
    volumeClaimTemplate:
      spec:
        storageClassName: fast-ssd
        resources:
          requests:
            storage: 500Gi
  serviceMonitorSelector:
    matchLabels:
      monitoring: enabled
  alerting:
    alertmanagers:
    - name: alertmanager-main
      namespace: monitoring
      port: web
---
# ServiceMonitor: auto-discover targets
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: payment-service
  labels:
    monitoring: enabled
spec:
  selector:
    matchLabels:
      app: payment
  endpoints:
  - port: metrics
    interval: 15s
    path: /metrics
---
# PrometheusRule: alerting rules as CRs
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: payment-alerts
spec:
  groups:
  - name: payment.rules
    rules:
    - alert: PaymentHighErrorRate
      expr: rate(http_requests_total{job="payment",status=~"5.."}[5m]) > 0.05
      for: 5m
      labels:
        severity: critical

Ecosystem Notable Production Operators

Popular Operators in Production

Operator	Manages	Maturity Level
CloudNativePG	PostgreSQL	Level 5 (AutoPilot)
Strimzi	Apache Kafka	Level 4
Prometheus Operator	Prometheus/Alertmanager	Level 5
cert-manager	TLS Certificates	Level 4
Elastic Cloud on K8s	Elasticsearch	Level 4
Rook	Ceph Storage	Level 4
Crossplane	Cloud Infrastructure	Level 3
ArgoCD	GitOps Deployments	Level 4

OperatorHub.io CNCF Production Ready

Operator Best Practices

Idempotency

                            
                            Critical Rule: The reconcile function MUST be idempotent. It will be called multiple times — on resource changes, on resync, on controller restart. If called 100 times with the same input, it must produce the same result as being called once. Never assume reconcile is called exactly once per change.
                        

# Idempotent patterns:
# ✅ GOOD: "Ensure X exists with this spec"
#   - Check if StatefulSet exists → if not, create it
#   - If it exists, compare spec → update if different
#   - If identical, do nothing

# ❌ BAD: "Create X"
#   - Creates duplicate resources on re-reconciliation
#   - Fails on second call ("already exists")

# ✅ GOOD: Use CreateOrUpdate / CreateOrPatch
#   controllerutil.CreateOrUpdate(ctx, r.Client, sts, func() error {
#       sts.Spec.Replicas = &desired.Replicas
#       return controllerutil.SetControllerReference(cr, sts, r.Scheme)
#   })

# ✅ GOOD: Owner references for garbage collection
#   - Set CR as owner of all created resources
#   - When CR is deleted, all owned resources are automatically cleaned up

Error Handling & Requeueing

# Reconcile return values control retry behavior:

# Success — no requeue:
# return ctrl.Result{}, nil

# Requeue immediately (transient error):
# return ctrl.Result{Requeue: true}, nil

# Requeue after delay (waiting for resource to be ready):
# return ctrl.Result{RequeueAfter: 30 * time.Second}, nil

# Error — requeued with exponential backoff:
# return ctrl.Result{}, fmt.Errorf("failed to create StatefulSet: %w", err)
# Backoff: 1s → 2s → 4s → 8s → ... → 16m (capped)

# Best practice: distinguish between:
# - Transient errors (network timeout): requeue with backoff
# - Permanent errors (invalid spec): update status, don't requeue
# - Waiting (resource not ready yet): requeue after fixed delay

Testing Operators

# Testing strategy for operators:

# 1. Unit tests: test reconcile logic with fake client
#    - Use fake.NewClientBuilder() to create in-memory K8s client
#    - Test each reconcile path independently
#    - Verify created/updated resources match expectations

# 2. Integration tests: test against real API server (envtest)
#    - envtest starts a real API server + etcd (no kubelet/scheduler)
#    - CRDs are installed, controllers run normally
#    - Tests verify end-to-end reconciliation
#    make test  # Uses envtest automatically

# 3. E2E tests: test on real cluster (Kind)
#    - Deploy operator to Kind cluster
#    - Create custom resources
#    - Verify actual pods, services, etc. are created
#    - Test failure scenarios (kill pods, network partitions)

# Run tests:
make test                    # Unit + integration (envtest)
make test-e2e               # Full E2E on Kind cluster

# Coverage:
go test ./... -coverprofile cover.out
go tool cover -html=cover.out

Exercises

                            
                            Exercise 1 — Create a CRD: Define a CRD for a "WebApplication" resource with fields: image, replicas, port, and environment (dev/staging/prod). Add schema validation (replicas 1-10, port 1-65535). Apply it, create instances, and verify kubectl get webapplications shows printer columns.
                        

                            
                            Exercise 2 — Deploy a Production Operator: Install the Prometheus Operator (via Helm or manifests). Create a ServiceMonitor for an existing application. Verify Prometheus discovers and scrapes the target. Explore the CRDs it installs (kubectl get crd | grep monitoring).
                        

                            
                            Exercise 3 — Scaffold an Operator: Use Kubebuilder to scaffold a simple operator that manages a "Guestbook" custom resource. Implement the reconcile function to create a Deployment and Service based on the CR spec. Test locally with make run.
                        

                            
                            Exercise 4 — Operator Lifecycle: Install CloudNativePG or Strimzi. Create a small cluster (1 instance). Scale it up (change replicas in CR). Simulate a failure (delete a pod). Observe the operator's reconciliation — how quickly does it recover? Check the CR's status conditions throughout.
                        

Conclusion

CRDs and Operators are what make Kubernetes a true platform — not just a container orchestrator. They let you:

Extend the API — define any domain concept as a first-class Kubernetes resource
Encode expertise — turn operational runbooks into automated reconciliation loops
Standardise operations — manage databases, queues, and infrastructure the same way you manage Deployments
Achieve Level 5 maturity — from basic install to fully autonomous operation

In Part 13, we'll cover Cluster Operations & Reliability — upgrade strategies, node management, resource quotas, PodDisruptionBudgets, and the practices that keep production clusters healthy at scale.

Previous Part 11: Kubernetes Internals Next Part 13: Operations & Reliability

Cookie Consent

Part 12: CRDs & Operators

Table of Contents

Extending the Kubernetes API

Why Extend?

Extension Mechanisms

Custom Resource Definitions

CRD Anatomy

Schema Validation

Versions & Conversion

Printer Columns & Status

The Operator Pattern

What Is an Operator?

Operator vs Controller

Operator Maturity Model

Building Operators

Kubebuilder

Operator SDK

The Reconcile Function

Status & Conditions

Production Operators

PostgreSQL Operator (CloudNativePG)

Kafka (Strimzi) Operator

Prometheus Operator

Popular Operators in Production

Operator Best Practices

Idempotency

Error Handling & Requeueing

Testing Operators

Exercises

Conclusion

Cookie Consent

Part 12: CRDs & Operators

Table of Contents

Extending the Kubernetes API

Why Extend?

Extension Mechanisms

Custom Resource Definitions

CRD Anatomy

Schema Validation

Versions & Conversion

Printer Columns & Status

The Operator Pattern

What Is an Operator?

Operator vs Controller

Operator Maturity Model

Building Operators

Kubebuilder

Operator SDK

The Reconcile Function

Status & Conditions

Production Operators

PostgreSQL Operator (CloudNativePG)

Kafka (Strimzi) Operator

Prometheus Operator

Popular Operators in Production

Operator Best Practices

Idempotency

Error Handling & Requeueing

Testing Operators

Exercises

Conclusion

Continue the Series

Part 11: Kubernetes Internals

Part 13: Cluster Operations & Reliability

Part 6: Kubernetes Architecture