Extending the Kubernetes API
Why Extend?
Kubernetes ships with built-in resources (Pods, Deployments, Services), but real-world infrastructure needs domain-specific concepts. You need to manage databases, message queues, ML training jobs, certificates, DNS records — things Kubernetes doesn't know about natively.
# Instead of manual database operations, declare what you want:
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: production-db
spec:
instances: 3
postgresql:
parameters:
shared_buffers: "512MB"
max_connections: "200"
storage:
size: 100Gi
storageClass: fast-ssd
backup:
barmanObjectStore:
destinationPath: s3://backups/production-db/
retentionPolicy: "30d"
monitoring:
enablePodMonitor: true
# The Operator handles:
# ✓ Provisioning 3 PostgreSQL instances
# ✓ Configuring streaming replication
# ✓ Automated failover if primary fails
# ✓ Daily backups to S3
# ✓ Monitoring integration
# ✓ Rolling upgrades
# ✓ Connection pooling
Extension Mechanisms
| Mechanism | Purpose | Complexity | Use Case |
|---|---|---|---|
| CRD + Controller | New resource types with reconciliation | Medium | Operators, platform APIs |
| Admission Webhooks | Validate/mutate resources on create/update | Low | Policy enforcement, defaults |
| API Aggregation | Full custom API server | High | metrics-server, custom-metrics |
| Scheduler Extenders | Custom scheduling logic | Medium | GPU scheduling, data locality |
| kubectl Plugins | Extend kubectl CLI | Low | Developer experience |
Custom Resource Definitions
CRD Anatomy
A CRD teaches the Kubernetes API Server about a new resource type. Once applied, you can create, read, update, and delete instances of your custom resource just like any built-in resource:
# Step 1: Define the CRD (teaches K8s about "Database" resources)
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: databases.mycompany.io # plural.group
spec:
group: mycompany.io # API group
names:
plural: databases # kubectl get databases
singular: database # kubectl get database my-db
kind: Database # YAML kind field
shortNames:
- db # kubectl get db
scope: Namespaced # Namespaced or Cluster
versions:
- name: v1alpha1
served: true # API serves this version
storage: true # Stored in etcd in this version
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
required: ["engine", "version", "replicas"]
properties:
engine:
type: string
enum: ["postgres", "mysql", "mongodb"]
version:
type: string
replicas:
type: integer
minimum: 1
maximum: 7
storage:
type: object
properties:
size:
type: string
pattern: "^[0-9]+(Gi|Ti)$"
storageClass:
type: string
status:
type: object
properties:
phase:
type: string
readyReplicas:
type: integer
endpoint:
type: string
subresources:
status: {} # Enable /status subresource
additionalPrinterColumns:
- name: Engine
type: string
jsonPath: .spec.engine
- name: Version
type: string
jsonPath: .spec.version
- name: Replicas
type: integer
jsonPath: .spec.replicas
- name: Phase
type: string
jsonPath: .status.phase
- name: Age
type: date
jsonPath: .metadata.creationTimestamp
# Step 2: Apply the CRD
kubectl apply -f database-crd.yaml
# Step 3: Create instances of the custom resource
kubectl apply -f my-database.yaml
# Step 4: Interact with it like any K8s resource:
kubectl get databases
# NAME ENGINE VERSION REPLICAS PHASE AGE
# production postgres 16 3 Ready 2d
# staging postgres 16 1 Ready 5h
# analytics mongodb 7.0 3 Ready 1d
kubectl get db production -o yaml
kubectl describe db production
kubectl delete db staging
# Step 3: Custom Resource instance
apiVersion: mycompany.io/v1alpha1
kind: Database
metadata:
name: production
namespace: default
spec:
engine: postgres
version: "16"
replicas: 3
storage:
size: "100Gi"
storageClass: fast-ssd
Schema Validation
OpenAPI v3 schemas in CRDs provide automatic validation — the API Server rejects invalid resources before they're stored:
# Schema validation catches errors at admission time:
spec:
engine: "redis" # ❌ Rejected: not in enum ["postgres","mysql","mongodb"]
replicas: 10 # ❌ Rejected: maximum is 7
storage:
size: "100MB" # ❌ Rejected: doesn't match pattern "^[0-9]+(Gi|Ti)$"
# Validation types available:
# - type: string/integer/boolean/object/array
# - enum: ["a", "b", "c"]
# - minimum/maximum (numbers)
# - minLength/maxLength (strings)
# - pattern (regex)
# - required: ["field1", "field2"]
# - default: "value" (set if not provided)
# - x-kubernetes-validations (CEL expressions for complex rules)
# Advanced: CEL validation rules (K8s 1.25+)
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
x-kubernetes-validations:
- rule: "self.replicas >= self.minAvailable"
message: "replicas must be >= minAvailable"
- rule: "self.engine == 'postgres' || self.version != '16'"
message: "version 16 is only available for postgres"
properties:
replicas:
type: integer
minAvailable:
type: integer
engine:
type: string
version:
type: string
Versions & Conversion
As your CRD evolves, you'll need multiple versions. Kubernetes supports serving multiple versions simultaneously with conversion webhooks to translate between them:
# Multi-version CRD with conversion:
spec:
group: mycompany.io
versions:
- name: v1alpha1
served: true
storage: false # Old version, still served but not stored
- name: v1beta1
served: true
storage: false
- name: v1
served: true
storage: true # New version, stored in etcd
conversion:
strategy: Webhook
webhook:
conversionReviewVersions: ["v1"]
clientConfig:
service:
namespace: system
name: database-operator-webhook
path: /convert
Printer Columns & Status
# additionalPrinterColumns control what "kubectl get" shows:
kubectl get databases
# NAME ENGINE VERSION REPLICAS PHASE AGE
# production postgres 16 3 Ready 2d
# The /status subresource separates spec (user intent) from status (system state):
# - Users update .spec (desired state)
# - Controllers update .status (actual state)
# - RBAC can grant different permissions to each
# Update status (controller code):
kubectl patch database production --type=merge --subresource=status \
-p '{"status":{"phase":"Ready","readyReplicas":3,"endpoint":"production.db.svc:5432"}}'
The Operator Pattern
What Is an Operator?
flowchart TD
subgraph User
CR[Custom Resource
Desired State YAML]
end
subgraph K8s API
API[API Server + etcd]
end
subgraph Operator Pod
CTRL[Controller
Reconcile Loop]
KNOW[Domain Knowledge
How to manage DB]
end
subgraph Managed Resources
STS[StatefulSet]
SVC[Services]
CM[ConfigMaps]
SEC[Secrets]
PDB[PodDisruptionBudget]
MON[ServiceMonitor]
end
CR --> API
API --> CTRL
CTRL --> KNOW
KNOW --> STS
KNOW --> SVC
KNOW --> CM
KNOW --> SEC
KNOW --> PDB
KNOW --> MON
The key insight: an Operator encodes human operational expertise as code. Instead of a runbook that says "if the primary database fails, promote the replica with the least replication lag," the Operator's reconcile loop does this automatically.
Operator vs Controller
| Aspect | Controller | Operator |
|---|---|---|
| Scope | Manages built-in resources | Manages CRDs (custom domain) |
| Knowledge | Generic (replicas, pods) | Domain-specific (backup, failover, replication) |
| Complexity | Stateless reconciliation | Stateful lifecycle management |
| Examples | Deployment controller, ReplicaSet controller | Postgres Operator, Kafka Operator, Cert-Manager |
| Day 2 Ops | Basic (restart, scale) | Full lifecycle (backup, restore, upgrade, failover) |
Operator Maturity Model
The Operator Maturity Model (from OperatorHub.io) defines five levels of sophistication:
| Level | Name | Capabilities | Example |
|---|---|---|---|
| 1 | Basic Install | Automated provisioning | Create DB from CR |
| 2 | Seamless Upgrades | Version upgrades, patch management | Rolling upgrade Postgres 15→16 |
| 3 | Full Lifecycle | Backup, restore, failure recovery | Automated backup + point-in-time restore |
| 4 | Deep Insights | Metrics, alerts, log integration | Custom Prometheus metrics + Grafana dashboards |
| 5 | Auto Pilot | Auto-scaling, auto-tuning, anomaly detection | Auto-scale replicas based on query load |
Building Operators
Kubebuilder
Kubebuilder is the official framework for building Kubernetes operators in Go. It generates project scaffolding, CRDs, controllers, webhooks, and RBAC — letting you focus on business logic:
# Initialize a new operator project:
mkdir database-operator && cd database-operator
kubebuilder init --domain mycompany.io --repo github.com/mycompany/database-operator
# Create a new API (CRD + Controller):
kubebuilder create api --group db --version v1alpha1 --kind Database
# Create Resource [y/n]: y
# Create Controller [y/n]: y
# Generated project structure:
# ├── api/v1alpha1/
# │ ├── database_types.go ← CRD Go types (spec, status)
# │ └── zz_generated_deepcopy.go ← Auto-generated
# ├── internal/controller/
# │ └── database_controller.go ← Reconcile logic (YOUR CODE)
# ├── config/
# │ ├── crd/ ← Generated CRD YAML
# │ ├── rbac/ ← RBAC manifests
# │ └── manager/ ← Deployment manifest
# ├── cmd/main.go ← Entry point
# ├── Dockerfile ← Container image
# └── Makefile ← Build commands
# Generate CRD manifests:
make manifests
# Run locally (for development):
make run
# Build and deploy:
make docker-build docker-push IMG=myregistry/database-operator:v0.1.0
make deploy IMG=myregistry/database-operator:v0.1.0
Operator SDK
Operator SDK (from Red Hat) wraps Kubebuilder with additional features and supports multiple languages:
| Framework | Language | Best For | Maturity |
|---|---|---|---|
| Kubebuilder (Go) | Go | Production operators, performance | Most mature |
| Operator SDK (Go) | Go | Kubebuilder + OLM integration | Most mature |
| Operator SDK (Ansible) | Ansible | Teams with Ansible expertise | Stable |
| Operator SDK (Helm) | Helm charts | Simple install/upgrade operators | Stable |
| KUDO | Declarative YAML | No-code operator definitions | Archived |
| Metacontroller | Any (webhooks) | Simple controllers in any language | Active |
The Reconcile Function
The reconcile function is the heart of every operator. It runs whenever the custom resource changes and must bring the system from its current state to the desired state:
# Reconcile pseudocode (Go-like):
# func (r *DatabaseReconciler) Reconcile(ctx, req) (Result, error) {
#
# // 1. Fetch the custom resource
# database := &v1alpha1.Database{}
# err := r.Get(ctx, req.NamespacedName, database)
# if err != nil {
# if errors.IsNotFound(err) {
# return Result{}, nil // Resource deleted, nothing to do
# }
# return Result{}, err // Requeue on error
# }
#
# // 2. Check if StatefulSet exists, create if not
# sts := &appsv1.StatefulSet{}
# err = r.Get(ctx, types.NamespacedName{Name: database.Name, Namespace: database.Namespace}, sts)
# if errors.IsNotFound(err) {
# sts = r.buildStatefulSet(database)
# err = r.Create(ctx, sts)
# // Set owner reference so STS is garbage collected with CR
# controllerutil.SetControllerReference(database, sts, r.Scheme)
# return Result{RequeueAfter: 10 * time.Second}, nil
# }
#
# // 3. Ensure desired state matches actual state
# if *sts.Spec.Replicas != database.Spec.Replicas {
# sts.Spec.Replicas = &database.Spec.Replicas
# err = r.Update(ctx, sts)
# return Result{RequeueAfter: 30 * time.Second}, nil
# }
#
# // 4. Update status
# database.Status.Phase = "Ready"
# database.Status.ReadyReplicas = sts.Status.ReadyReplicas
# database.Status.Endpoint = fmt.Sprintf("%s.%s.svc:5432", database.Name, database.Namespace)
# err = r.Status().Update(ctx, database)
#
# return Result{}, nil // Reconciliation complete
# }
flowchart TD
A[Event: CR Changed] --> B[Fetch CR from API Server]
B --> C{CR exists?}
C -->|No| D[Cleanup owned resources
Return]
C -->|Yes| E[Check owned StatefulSet]
E --> F{STS exists?}
F -->|No| G[Create StatefulSet
Set owner reference]
F -->|Yes| H{Spec matches?}
H -->|No| I[Update StatefulSet]
H -->|Yes| J[Check Service]
J --> K{Service exists?}
K -->|No| L[Create Service]
K -->|Yes| M[Update CR Status]
G --> N[Requeue after 10s]
I --> N
L --> N
M --> O[Done — wait for next event]
Status & Conditions
Operators should report detailed status using the Kubernetes conditions pattern — standard fields that tools like kubectl understand:
# Status with conditions (standard pattern):
status:
phase: Running
readyReplicas: 3
endpoint: production.default.svc.cluster.local:5432
conditions:
- type: Ready
status: "True"
lastTransitionTime: "2026-05-14T10:30:00Z"
reason: AllReplicasReady
message: "3/3 replicas are ready and accepting connections"
- type: BackupReady
status: "True"
lastTransitionTime: "2026-05-14T02:00:00Z"
reason: BackupCompleted
message: "Last backup: 2026-05-14T02:00:00Z (30 retained)"
- type: ReplicationHealthy
status: "True"
lastTransitionTime: "2026-05-14T10:29:55Z"
reason: ReplicationLagNormal
message: "Max replication lag: 0.2s (threshold: 30s)"
Production Operators
PostgreSQL Operator (CloudNativePG)
# CloudNativePG: production PostgreSQL on Kubernetes
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: production-db
spec:
instances: 3 # 1 primary + 2 replicas
primaryUpdateStrategy: unsupervised
postgresql:
parameters:
shared_buffers: "1GB"
effective_cache_size: "3GB"
max_connections: "300"
work_mem: "16MB"
storage:
size: 200Gi
storageClass: premium-ssd
backup:
barmanObjectStore:
destinationPath: s3://backups/production/
s3Credentials:
accessKeyId:
name: s3-creds
key: ACCESS_KEY_ID
secretAccessKey:
name: s3-creds
key: SECRET_ACCESS_KEY
retentionPolicy: "30d"
monitoring:
enablePodMonitor: true
customQueriesConfigMap:
- name: custom-queries
key: queries
# What the operator does automatically:
# ✓ Provisions 3-node HA cluster (1 primary, 2 read replicas)
# ✓ Configures streaming replication
# ✓ Detects primary failure → promotes healthiest replica (~5s)
# ✓ Continuous WAL archiving to S3
# ✓ Point-in-time recovery capability
# ✓ Rolling upgrades (minor version)
# ✓ Connection pooling via PgBouncer sidecar
# ✓ Prometheus metrics + ServiceMonitor
Kafka (Strimzi) Operator
# Strimzi: Apache Kafka on Kubernetes
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: production-kafka
spec:
kafka:
version: "3.7.0"
replicas: 3
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: tls
port: 9093
type: internal
tls: true
config:
offsets.topic.replication.factor: 3
transaction.state.log.replication.factor: 3
default.replication.factor: 3
min.insync.replicas: 2
storage:
type: jbod
volumes:
- id: 0
type: persistent-claim
size: 500Gi
class: fast-ssd
zookeeper:
replicas: 3
storage:
type: persistent-claim
size: 50Gi
entityOperator:
topicOperator: {}
userOperator: {}
# Additional CRDs the operator provides:
# - KafkaTopic: manage topics declaratively
# - KafkaUser: manage ACLs and authentication
# - KafkaConnect: managed Kafka Connect clusters
# - KafkaMirrorMaker2: cross-cluster replication
Prometheus Operator
# Prometheus Operator: monitoring as code
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: production
spec:
replicas: 2
retention: 30d
storage:
volumeClaimTemplate:
spec:
storageClassName: fast-ssd
resources:
requests:
storage: 500Gi
serviceMonitorSelector:
matchLabels:
monitoring: enabled
alerting:
alertmanagers:
- name: alertmanager-main
namespace: monitoring
port: web
---
# ServiceMonitor: auto-discover targets
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: payment-service
labels:
monitoring: enabled
spec:
selector:
matchLabels:
app: payment
endpoints:
- port: metrics
interval: 15s
path: /metrics
---
# PrometheusRule: alerting rules as CRs
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: payment-alerts
spec:
groups:
- name: payment.rules
rules:
- alert: PaymentHighErrorRate
expr: rate(http_requests_total{job="payment",status=~"5.."}[5m]) > 0.05
for: 5m
labels:
severity: critical
Popular Operators in Production
| Operator | Manages | Maturity Level |
|---|---|---|
| CloudNativePG | PostgreSQL | Level 5 (AutoPilot) |
| Strimzi | Apache Kafka | Level 4 |
| Prometheus Operator | Prometheus/Alertmanager | Level 5 |
| cert-manager | TLS Certificates | Level 4 |
| Elastic Cloud on K8s | Elasticsearch | Level 4 |
| Rook | Ceph Storage | Level 4 |
| Crossplane | Cloud Infrastructure | Level 3 |
| ArgoCD | GitOps Deployments | Level 4 |
Operator Best Practices
Idempotency
# Idempotent patterns:
# ✅ GOOD: "Ensure X exists with this spec"
# - Check if StatefulSet exists → if not, create it
# - If it exists, compare spec → update if different
# - If identical, do nothing
# ❌ BAD: "Create X"
# - Creates duplicate resources on re-reconciliation
# - Fails on second call ("already exists")
# ✅ GOOD: Use CreateOrUpdate / CreateOrPatch
# controllerutil.CreateOrUpdate(ctx, r.Client, sts, func() error {
# sts.Spec.Replicas = &desired.Replicas
# return controllerutil.SetControllerReference(cr, sts, r.Scheme)
# })
# ✅ GOOD: Owner references for garbage collection
# - Set CR as owner of all created resources
# - When CR is deleted, all owned resources are automatically cleaned up
Error Handling & Requeueing
# Reconcile return values control retry behavior:
# Success — no requeue:
# return ctrl.Result{}, nil
# Requeue immediately (transient error):
# return ctrl.Result{Requeue: true}, nil
# Requeue after delay (waiting for resource to be ready):
# return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
# Error — requeued with exponential backoff:
# return ctrl.Result{}, fmt.Errorf("failed to create StatefulSet: %w", err)
# Backoff: 1s → 2s → 4s → 8s → ... → 16m (capped)
# Best practice: distinguish between:
# - Transient errors (network timeout): requeue with backoff
# - Permanent errors (invalid spec): update status, don't requeue
# - Waiting (resource not ready yet): requeue after fixed delay
Testing Operators
# Testing strategy for operators:
# 1. Unit tests: test reconcile logic with fake client
# - Use fake.NewClientBuilder() to create in-memory K8s client
# - Test each reconcile path independently
# - Verify created/updated resources match expectations
# 2. Integration tests: test against real API server (envtest)
# - envtest starts a real API server + etcd (no kubelet/scheduler)
# - CRDs are installed, controllers run normally
# - Tests verify end-to-end reconciliation
# make test # Uses envtest automatically
# 3. E2E tests: test on real cluster (Kind)
# - Deploy operator to Kind cluster
# - Create custom resources
# - Verify actual pods, services, etc. are created
# - Test failure scenarios (kill pods, network partitions)
# Run tests:
make test # Unit + integration (envtest)
make test-e2e # Full E2E on Kind cluster
# Coverage:
go test ./... -coverprofile cover.out
go tool cover -html=cover.out
Exercises
kubectl get webapplications shows printer columns.
kubectl get crd | grep monitoring).
make run.
Conclusion
CRDs and Operators are what make Kubernetes a true platform — not just a container orchestrator. They let you:
- Extend the API — define any domain concept as a first-class Kubernetes resource
- Encode expertise — turn operational runbooks into automated reconciliation loops
- Standardise operations — manage databases, queues, and infrastructure the same way you manage Deployments
- Achieve Level 5 maturity — from basic install to fully autonomous operation
In Part 13, we'll cover Cluster Operations & Reliability — upgrade strategies, node management, resource quotas, PodDisruptionBudgets, and the practices that keep production clusters healthy at scale.