Back to Distributed Systems & Kubernetes Series

Part 12: CRDs & Operators

May 14, 2026 Wasil Zafar 42 min read

Custom Resource Definitions let you extend the Kubernetes API with your own object types. Operators encode operational knowledge into controllers — turning complex "Day 2" operations (backup, failover, scaling) into automated reconciliation loops.

Table of Contents

  1. Extending the Kubernetes API
  2. Custom Resource Definitions
  3. The Operator Pattern
  4. Building Operators
  5. Production Operators
  6. Operator Best Practices
  7. Exercises
  8. Conclusion

Extending the Kubernetes API

Why Extend?

Kubernetes ships with built-in resources (Pods, Deployments, Services), but real-world infrastructure needs domain-specific concepts. You need to manage databases, message queues, ML training jobs, certificates, DNS records — things Kubernetes doesn't know about natively.

The Vision: What if you could manage a PostgreSQL cluster the same way you manage a Deployment? Write a YAML manifest declaring "I want a 3-node PostgreSQL cluster with streaming replication, automated failover, and daily backups" — and Kubernetes makes it so. That's what CRDs and Operators enable.
# Instead of manual database operations, declare what you want:
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: production-db
spec:
  instances: 3
  postgresql:
    parameters:
      shared_buffers: "512MB"
      max_connections: "200"
  storage:
    size: 100Gi
    storageClass: fast-ssd
  backup:
    barmanObjectStore:
      destinationPath: s3://backups/production-db/
    retentionPolicy: "30d"
  monitoring:
    enablePodMonitor: true

# The Operator handles:
# ✓ Provisioning 3 PostgreSQL instances
# ✓ Configuring streaming replication
# ✓ Automated failover if primary fails
# ✓ Daily backups to S3
# ✓ Monitoring integration
# ✓ Rolling upgrades
# ✓ Connection pooling

Extension Mechanisms

Mechanism Purpose Complexity Use Case
CRD + Controller New resource types with reconciliation Medium Operators, platform APIs
Admission Webhooks Validate/mutate resources on create/update Low Policy enforcement, defaults
API Aggregation Full custom API server High metrics-server, custom-metrics
Scheduler Extenders Custom scheduling logic Medium GPU scheduling, data locality
kubectl Plugins Extend kubectl CLI Low Developer experience

Custom Resource Definitions

CRD Anatomy

A CRD teaches the Kubernetes API Server about a new resource type. Once applied, you can create, read, update, and delete instances of your custom resource just like any built-in resource:

# Step 1: Define the CRD (teaches K8s about "Database" resources)
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: databases.mycompany.io    # plural.group
spec:
  group: mycompany.io             # API group
  names:
    plural: databases             # kubectl get databases
    singular: database            # kubectl get database my-db
    kind: Database                # YAML kind field
    shortNames:
    - db                          # kubectl get db
  scope: Namespaced               # Namespaced or Cluster
  versions:
  - name: v1alpha1
    served: true                  # API serves this version
    storage: true                 # Stored in etcd in this version
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            required: ["engine", "version", "replicas"]
            properties:
              engine:
                type: string
                enum: ["postgres", "mysql", "mongodb"]
              version:
                type: string
              replicas:
                type: integer
                minimum: 1
                maximum: 7
              storage:
                type: object
                properties:
                  size:
                    type: string
                    pattern: "^[0-9]+(Gi|Ti)$"
                  storageClass:
                    type: string
          status:
            type: object
            properties:
              phase:
                type: string
              readyReplicas:
                type: integer
              endpoint:
                type: string
    subresources:
      status: {}                  # Enable /status subresource
    additionalPrinterColumns:
    - name: Engine
      type: string
      jsonPath: .spec.engine
    - name: Version
      type: string
      jsonPath: .spec.version
    - name: Replicas
      type: integer
      jsonPath: .spec.replicas
    - name: Phase
      type: string
      jsonPath: .status.phase
    - name: Age
      type: date
      jsonPath: .metadata.creationTimestamp
# Step 2: Apply the CRD
kubectl apply -f database-crd.yaml

# Step 3: Create instances of the custom resource
kubectl apply -f my-database.yaml

# Step 4: Interact with it like any K8s resource:
kubectl get databases
# NAME          ENGINE    VERSION   REPLICAS   PHASE   AGE
# production    postgres  16        3          Ready   2d
# staging       postgres  16        1          Ready   5h
# analytics     mongodb   7.0       3          Ready   1d

kubectl get db production -o yaml
kubectl describe db production
kubectl delete db staging
# Step 3: Custom Resource instance
apiVersion: mycompany.io/v1alpha1
kind: Database
metadata:
  name: production
  namespace: default
spec:
  engine: postgres
  version: "16"
  replicas: 3
  storage:
    size: "100Gi"
    storageClass: fast-ssd

Schema Validation

OpenAPI v3 schemas in CRDs provide automatic validation — the API Server rejects invalid resources before they're stored:

# Schema validation catches errors at admission time:
spec:
  engine: "redis"    # ❌ Rejected: not in enum ["postgres","mysql","mongodb"]
  replicas: 10       # ❌ Rejected: maximum is 7
  storage:
    size: "100MB"    # ❌ Rejected: doesn't match pattern "^[0-9]+(Gi|Ti)$"

# Validation types available:
# - type: string/integer/boolean/object/array
# - enum: ["a", "b", "c"]
# - minimum/maximum (numbers)
# - minLength/maxLength (strings)
# - pattern (regex)
# - required: ["field1", "field2"]
# - default: "value" (set if not provided)
# - x-kubernetes-validations (CEL expressions for complex rules)
# Advanced: CEL validation rules (K8s 1.25+)
schema:
  openAPIV3Schema:
    type: object
    properties:
      spec:
        type: object
        x-kubernetes-validations:
        - rule: "self.replicas >= self.minAvailable"
          message: "replicas must be >= minAvailable"
        - rule: "self.engine == 'postgres' || self.version != '16'"
          message: "version 16 is only available for postgres"
        properties:
          replicas:
            type: integer
          minAvailable:
            type: integer
          engine:
            type: string
          version:
            type: string

Versions & Conversion

As your CRD evolves, you'll need multiple versions. Kubernetes supports serving multiple versions simultaneously with conversion webhooks to translate between them:

# Multi-version CRD with conversion:
spec:
  group: mycompany.io
  versions:
  - name: v1alpha1
    served: true
    storage: false     # Old version, still served but not stored
  - name: v1beta1
    served: true
    storage: false
  - name: v1
    served: true
    storage: true      # New version, stored in etcd
  conversion:
    strategy: Webhook
    webhook:
      conversionReviewVersions: ["v1"]
      clientConfig:
        service:
          namespace: system
          name: database-operator-webhook
          path: /convert

Printer Columns & Status

# additionalPrinterColumns control what "kubectl get" shows:
kubectl get databases
# NAME          ENGINE    VERSION   REPLICAS   PHASE   AGE
# production    postgres  16        3          Ready   2d

# The /status subresource separates spec (user intent) from status (system state):
# - Users update .spec (desired state)
# - Controllers update .status (actual state)
# - RBAC can grant different permissions to each

# Update status (controller code):
kubectl patch database production --type=merge --subresource=status \
  -p '{"status":{"phase":"Ready","readyReplicas":3,"endpoint":"production.db.svc:5432"}}'

The Operator Pattern

What Is an Operator?

Definition: An Operator is a Kubernetes controller that uses Custom Resources to manage applications and their components. It encodes the operational knowledge of a human operator (DBA, SRE) into software — automating tasks like provisioning, scaling, backup, failover, and upgrades.
The Operator Pattern
flowchart TD
    subgraph User
        CR[Custom Resource
Desired State YAML] end subgraph K8s API API[API Server + etcd] end subgraph Operator Pod CTRL[Controller
Reconcile Loop] KNOW[Domain Knowledge
How to manage DB] end subgraph Managed Resources STS[StatefulSet] SVC[Services] CM[ConfigMaps] SEC[Secrets] PDB[PodDisruptionBudget] MON[ServiceMonitor] end CR --> API API --> CTRL CTRL --> KNOW KNOW --> STS KNOW --> SVC KNOW --> CM KNOW --> SEC KNOW --> PDB KNOW --> MON

The key insight: an Operator encodes human operational expertise as code. Instead of a runbook that says "if the primary database fails, promote the replica with the least replication lag," the Operator's reconcile loop does this automatically.

Operator vs Controller

Aspect Controller Operator
Scope Manages built-in resources Manages CRDs (custom domain)
Knowledge Generic (replicas, pods) Domain-specific (backup, failover, replication)
Complexity Stateless reconciliation Stateful lifecycle management
Examples Deployment controller, ReplicaSet controller Postgres Operator, Kafka Operator, Cert-Manager
Day 2 Ops Basic (restart, scale) Full lifecycle (backup, restore, upgrade, failover)

Operator Maturity Model

The Operator Maturity Model (from OperatorHub.io) defines five levels of sophistication:

Level Name Capabilities Example
1 Basic Install Automated provisioning Create DB from CR
2 Seamless Upgrades Version upgrades, patch management Rolling upgrade Postgres 15→16
3 Full Lifecycle Backup, restore, failure recovery Automated backup + point-in-time restore
4 Deep Insights Metrics, alerts, log integration Custom Prometheus metrics + Grafana dashboards
5 Auto Pilot Auto-scaling, auto-tuning, anomaly detection Auto-scale replicas based on query load

Building Operators

Kubebuilder

Kubebuilder is the official framework for building Kubernetes operators in Go. It generates project scaffolding, CRDs, controllers, webhooks, and RBAC — letting you focus on business logic:

# Initialize a new operator project:
mkdir database-operator && cd database-operator
kubebuilder init --domain mycompany.io --repo github.com/mycompany/database-operator

# Create a new API (CRD + Controller):
kubebuilder create api --group db --version v1alpha1 --kind Database
# Create Resource [y/n]: y
# Create Controller [y/n]: y

# Generated project structure:
# ├── api/v1alpha1/
# │   ├── database_types.go        ← CRD Go types (spec, status)
# │   └── zz_generated_deepcopy.go ← Auto-generated
# ├── internal/controller/
# │   └── database_controller.go   ← Reconcile logic (YOUR CODE)
# ├── config/
# │   ├── crd/                     ← Generated CRD YAML
# │   ├── rbac/                    ← RBAC manifests
# │   └── manager/                 ← Deployment manifest
# ├── cmd/main.go                  ← Entry point
# ├── Dockerfile                   ← Container image
# └── Makefile                     ← Build commands

# Generate CRD manifests:
make manifests

# Run locally (for development):
make run

# Build and deploy:
make docker-build docker-push IMG=myregistry/database-operator:v0.1.0
make deploy IMG=myregistry/database-operator:v0.1.0

Operator SDK

Operator SDK (from Red Hat) wraps Kubebuilder with additional features and supports multiple languages:

Framework Language Best For Maturity
Kubebuilder (Go) Go Production operators, performance Most mature
Operator SDK (Go) Go Kubebuilder + OLM integration Most mature
Operator SDK (Ansible) Ansible Teams with Ansible expertise Stable
Operator SDK (Helm) Helm charts Simple install/upgrade operators Stable
KUDO Declarative YAML No-code operator definitions Archived
Metacontroller Any (webhooks) Simple controllers in any language Active

The Reconcile Function

The reconcile function is the heart of every operator. It runs whenever the custom resource changes and must bring the system from its current state to the desired state:

# Reconcile pseudocode (Go-like):

# func (r *DatabaseReconciler) Reconcile(ctx, req) (Result, error) {
#
#   // 1. Fetch the custom resource
#   database := &v1alpha1.Database{}
#   err := r.Get(ctx, req.NamespacedName, database)
#   if err != nil {
#       if errors.IsNotFound(err) {
#           return Result{}, nil  // Resource deleted, nothing to do
#       }
#       return Result{}, err      // Requeue on error
#   }
#
#   // 2. Check if StatefulSet exists, create if not
#   sts := &appsv1.StatefulSet{}
#   err = r.Get(ctx, types.NamespacedName{Name: database.Name, Namespace: database.Namespace}, sts)
#   if errors.IsNotFound(err) {
#       sts = r.buildStatefulSet(database)
#       err = r.Create(ctx, sts)
#       // Set owner reference so STS is garbage collected with CR
#       controllerutil.SetControllerReference(database, sts, r.Scheme)
#       return Result{RequeueAfter: 10 * time.Second}, nil
#   }
#
#   // 3. Ensure desired state matches actual state
#   if *sts.Spec.Replicas != database.Spec.Replicas {
#       sts.Spec.Replicas = &database.Spec.Replicas
#       err = r.Update(ctx, sts)
#       return Result{RequeueAfter: 30 * time.Second}, nil
#   }
#
#   // 4. Update status
#   database.Status.Phase = "Ready"
#   database.Status.ReadyReplicas = sts.Status.ReadyReplicas
#   database.Status.Endpoint = fmt.Sprintf("%s.%s.svc:5432", database.Name, database.Namespace)
#   err = r.Status().Update(ctx, database)
#
#   return Result{}, nil  // Reconciliation complete
# }
Reconcile Loop Flow
flowchart TD
    A[Event: CR Changed] --> B[Fetch CR from API Server]
    B --> C{CR exists?}
    C -->|No| D[Cleanup owned resources
Return] C -->|Yes| E[Check owned StatefulSet] E --> F{STS exists?} F -->|No| G[Create StatefulSet
Set owner reference] F -->|Yes| H{Spec matches?} H -->|No| I[Update StatefulSet] H -->|Yes| J[Check Service] J --> K{Service exists?} K -->|No| L[Create Service] K -->|Yes| M[Update CR Status] G --> N[Requeue after 10s] I --> N L --> N M --> O[Done — wait for next event]

Status & Conditions

Operators should report detailed status using the Kubernetes conditions pattern — standard fields that tools like kubectl understand:

# Status with conditions (standard pattern):
status:
  phase: Running
  readyReplicas: 3
  endpoint: production.default.svc.cluster.local:5432
  conditions:
  - type: Ready
    status: "True"
    lastTransitionTime: "2026-05-14T10:30:00Z"
    reason: AllReplicasReady
    message: "3/3 replicas are ready and accepting connections"
  - type: BackupReady
    status: "True"
    lastTransitionTime: "2026-05-14T02:00:00Z"
    reason: BackupCompleted
    message: "Last backup: 2026-05-14T02:00:00Z (30 retained)"
  - type: ReplicationHealthy
    status: "True"
    lastTransitionTime: "2026-05-14T10:29:55Z"
    reason: ReplicationLagNormal
    message: "Max replication lag: 0.2s (threshold: 30s)"

Production Operators

PostgreSQL Operator (CloudNativePG)

# CloudNativePG: production PostgreSQL on Kubernetes
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: production-db
spec:
  instances: 3                    # 1 primary + 2 replicas
  primaryUpdateStrategy: unsupervised
  
  postgresql:
    parameters:
      shared_buffers: "1GB"
      effective_cache_size: "3GB"
      max_connections: "300"
      work_mem: "16MB"
  
  storage:
    size: 200Gi
    storageClass: premium-ssd
  
  backup:
    barmanObjectStore:
      destinationPath: s3://backups/production/
      s3Credentials:
        accessKeyId:
          name: s3-creds
          key: ACCESS_KEY_ID
        secretAccessKey:
          name: s3-creds
          key: SECRET_ACCESS_KEY
    retentionPolicy: "30d"
  
  monitoring:
    enablePodMonitor: true
    customQueriesConfigMap:
    - name: custom-queries
      key: queries

# What the operator does automatically:
# ✓ Provisions 3-node HA cluster (1 primary, 2 read replicas)
# ✓ Configures streaming replication
# ✓ Detects primary failure → promotes healthiest replica (~5s)
# ✓ Continuous WAL archiving to S3
# ✓ Point-in-time recovery capability
# ✓ Rolling upgrades (minor version)
# ✓ Connection pooling via PgBouncer sidecar
# ✓ Prometheus metrics + ServiceMonitor

Kafka (Strimzi) Operator

# Strimzi: Apache Kafka on Kubernetes
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: production-kafka
spec:
  kafka:
    version: "3.7.0"
    replicas: 3
    listeners:
    - name: plain
      port: 9092
      type: internal
      tls: false
    - name: tls
      port: 9093
      type: internal
      tls: true
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      default.replication.factor: 3
      min.insync.replicas: 2
    storage:
      type: jbod
      volumes:
      - id: 0
        type: persistent-claim
        size: 500Gi
        class: fast-ssd
  zookeeper:
    replicas: 3
    storage:
      type: persistent-claim
      size: 50Gi
  entityOperator:
    topicOperator: {}
    userOperator: {}

# Additional CRDs the operator provides:
# - KafkaTopic: manage topics declaratively
# - KafkaUser: manage ACLs and authentication
# - KafkaConnect: managed Kafka Connect clusters
# - KafkaMirrorMaker2: cross-cluster replication

Prometheus Operator

# Prometheus Operator: monitoring as code
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: production
spec:
  replicas: 2
  retention: 30d
  storage:
    volumeClaimTemplate:
      spec:
        storageClassName: fast-ssd
        resources:
          requests:
            storage: 500Gi
  serviceMonitorSelector:
    matchLabels:
      monitoring: enabled
  alerting:
    alertmanagers:
    - name: alertmanager-main
      namespace: monitoring
      port: web
---
# ServiceMonitor: auto-discover targets
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: payment-service
  labels:
    monitoring: enabled
spec:
  selector:
    matchLabels:
      app: payment
  endpoints:
  - port: metrics
    interval: 15s
    path: /metrics
---
# PrometheusRule: alerting rules as CRs
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: payment-alerts
spec:
  groups:
  - name: payment.rules
    rules:
    - alert: PaymentHighErrorRate
      expr: rate(http_requests_total{job="payment",status=~"5.."}[5m]) > 0.05
      for: 5m
      labels:
        severity: critical
Ecosystem Notable Production Operators
Popular Operators in Production
OperatorManagesMaturity Level
CloudNativePGPostgreSQLLevel 5 (AutoPilot)
StrimziApache KafkaLevel 4
Prometheus OperatorPrometheus/AlertmanagerLevel 5
cert-managerTLS CertificatesLevel 4
Elastic Cloud on K8sElasticsearchLevel 4
RookCeph StorageLevel 4
CrossplaneCloud InfrastructureLevel 3
ArgoCDGitOps DeploymentsLevel 4
OperatorHub.io CNCF Production Ready

Operator Best Practices

Idempotency

Critical Rule: The reconcile function MUST be idempotent. It will be called multiple times — on resource changes, on resync, on controller restart. If called 100 times with the same input, it must produce the same result as being called once. Never assume reconcile is called exactly once per change.
# Idempotent patterns:
# ✅ GOOD: "Ensure X exists with this spec"
#   - Check if StatefulSet exists → if not, create it
#   - If it exists, compare spec → update if different
#   - If identical, do nothing

# ❌ BAD: "Create X"
#   - Creates duplicate resources on re-reconciliation
#   - Fails on second call ("already exists")

# ✅ GOOD: Use CreateOrUpdate / CreateOrPatch
#   controllerutil.CreateOrUpdate(ctx, r.Client, sts, func() error {
#       sts.Spec.Replicas = &desired.Replicas
#       return controllerutil.SetControllerReference(cr, sts, r.Scheme)
#   })

# ✅ GOOD: Owner references for garbage collection
#   - Set CR as owner of all created resources
#   - When CR is deleted, all owned resources are automatically cleaned up

Error Handling & Requeueing

# Reconcile return values control retry behavior:

# Success — no requeue:
# return ctrl.Result{}, nil

# Requeue immediately (transient error):
# return ctrl.Result{Requeue: true}, nil

# Requeue after delay (waiting for resource to be ready):
# return ctrl.Result{RequeueAfter: 30 * time.Second}, nil

# Error — requeued with exponential backoff:
# return ctrl.Result{}, fmt.Errorf("failed to create StatefulSet: %w", err)
# Backoff: 1s → 2s → 4s → 8s → ... → 16m (capped)

# Best practice: distinguish between:
# - Transient errors (network timeout): requeue with backoff
# - Permanent errors (invalid spec): update status, don't requeue
# - Waiting (resource not ready yet): requeue after fixed delay

Testing Operators

# Testing strategy for operators:

# 1. Unit tests: test reconcile logic with fake client
#    - Use fake.NewClientBuilder() to create in-memory K8s client
#    - Test each reconcile path independently
#    - Verify created/updated resources match expectations

# 2. Integration tests: test against real API server (envtest)
#    - envtest starts a real API server + etcd (no kubelet/scheduler)
#    - CRDs are installed, controllers run normally
#    - Tests verify end-to-end reconciliation
#    make test  # Uses envtest automatically

# 3. E2E tests: test on real cluster (Kind)
#    - Deploy operator to Kind cluster
#    - Create custom resources
#    - Verify actual pods, services, etc. are created
#    - Test failure scenarios (kill pods, network partitions)

# Run tests:
make test                    # Unit + integration (envtest)
make test-e2e               # Full E2E on Kind cluster

# Coverage:
go test ./... -coverprofile cover.out
go tool cover -html=cover.out

Exercises

Exercise 1 — Create a CRD: Define a CRD for a "WebApplication" resource with fields: image, replicas, port, and environment (dev/staging/prod). Add schema validation (replicas 1-10, port 1-65535). Apply it, create instances, and verify kubectl get webapplications shows printer columns.
Exercise 2 — Deploy a Production Operator: Install the Prometheus Operator (via Helm or manifests). Create a ServiceMonitor for an existing application. Verify Prometheus discovers and scrapes the target. Explore the CRDs it installs (kubectl get crd | grep monitoring).
Exercise 3 — Scaffold an Operator: Use Kubebuilder to scaffold a simple operator that manages a "Guestbook" custom resource. Implement the reconcile function to create a Deployment and Service based on the CR spec. Test locally with make run.
Exercise 4 — Operator Lifecycle: Install CloudNativePG or Strimzi. Create a small cluster (1 instance). Scale it up (change replicas in CR). Simulate a failure (delete a pod). Observe the operator's reconciliation — how quickly does it recover? Check the CR's status conditions throughout.

Conclusion

CRDs and Operators are what make Kubernetes a true platform — not just a container orchestrator. They let you:

  • Extend the API — define any domain concept as a first-class Kubernetes resource
  • Encode expertise — turn operational runbooks into automated reconciliation loops
  • Standardise operations — manage databases, queues, and infrastructure the same way you manage Deployments
  • Achieve Level 5 maturity — from basic install to fully autonomous operation

In Part 13, we'll cover Cluster Operations & Reliability — upgrade strategies, node management, resource quotas, PodDisruptionBudgets, and the practices that keep production clusters healthy at scale.