Parts 6–15 gave you mastery of Kubernetes itself. But no production cluster runs on bare Kubernetes alone — it's surrounded by an ecosystem of tools for deployment automation, configuration management, progressive delivery, and platform engineering. This final part maps that ecosystem, explains when to use each tool, and includes hands-on exercises so you leave with practical skill, not just awareness.
GitOps
Principles
The four principles of GitOps (OpenGitOps):
- Declarative: The entire system is described declaratively (YAML manifests)
- Versioned & Immutable: Desired state is stored in Git (versioned, auditable, rollback-able)
- Pulled Automatically: Agents pull desired state and apply it (no manual kubectl apply)
- Continuously Reconciled: Agents detect drift and self-heal to match Git
flowchart LR
DEV[Developer] -->|Pull Request| GIT[Git Repository
Source of Truth]
GIT -->|Webhook/Poll| AGENT[GitOps Agent
ArgoCD / Flux]
AGENT -->|Reconcile| K8S[Kubernetes Cluster
Actual State]
K8S -->|Drift Detection| AGENT
AGENT -->|Status| GIT
ArgoCD
# ArgoCD Application: declare what to deploy and where
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: payment-service
namespace: argocd
spec:
project: production
source:
repoURL: https://github.com/mycompany/k8s-manifests.git
targetRevision: main
path: apps/payment-service/overlays/production
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true # Delete resources removed from Git
selfHeal: true # Auto-fix manual changes (drift)
syncOptions:
- CreateNamespace=true
retry:
limit: 3
backoff:
duration: 5s
factor: 2
maxDuration: 3m
# ArgoCD CLI:
argocd app list
# NAME CLUSTER NAMESPACE STATUS HEALTH SYNC
# payment-service in-cluster production Synced Healthy Auto
argocd app get payment-service
# Shows: sync status, health, last sync time, resources managed
argocd app sync payment-service # Force sync now
argocd app rollback payment-service # Rollback to previous version
argocd app diff payment-service # Show what would change
Flux
# Flux: lightweight GitOps with CRDs (no UI, pure K8s native)
# GitRepository: source definition
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: k8s-manifests
namespace: flux-system
spec:
interval: 1m
url: https://github.com/mycompany/k8s-manifests.git
ref:
branch: main
---
# Kustomization: what to deploy from the source
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: payment-service
namespace: flux-system
spec:
interval: 5m
path: ./apps/payment-service/overlays/production
prune: true
sourceRef:
kind: GitRepository
name: k8s-manifests
healthChecks:
- apiVersion: apps/v1
kind: Deployment
name: payment-service
namespace: production
Application Packaging
Helm Charts
# Helm: the package manager for Kubernetes
# Install a chart:
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install redis bitnami/redis \
--namespace cache --create-namespace \
--set auth.password=mySecretPass \
--set replica.replicaCount=3
# Upgrade:
helm upgrade redis bitnami/redis --set replica.replicaCount=5
# Rollback:
helm rollback redis 1 # Revision 1
# List releases:
helm list -A
# NAME NAMESPACE REVISION STATUS CHART APP VERSION
# redis cache 2 deployed redis-18.4.0 7.2.4
# Chart structure:
# my-chart/
# ├── Chart.yaml # Metadata (name, version, dependencies)
# ├── values.yaml # Default configuration values
# ├── templates/
# │ ├── deployment.yaml # Templated K8s manifest
# │ ├── service.yaml
# │ ├── ingress.yaml
# │ ├── _helpers.tpl # Template helpers
# │ └── NOTES.txt # Post-install message
# └── charts/ # Sub-chart dependencies
# Example template (templates/deployment.yaml):
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "my-chart.fullname" . }}
labels:
{{- include "my-chart.labels" . | nindent 4 }}
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
{{- include "my-chart.selectorLabels" . | nindent 6 }}
template:
spec:
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
ports:
- containerPort: {{ .Values.service.port }}
resources:
{{- toYaml .Values.resources | nindent 12 }}
Kustomize
# Kustomize: template-free configuration customization
# Base (shared across environments):
# base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- deployment.yaml
- service.yaml
- hpa.yaml
# Overlay for production:
# overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base
namePrefix: prod-
namespace: production
patches:
- target:
kind: Deployment
name: payment
patch: |
- op: replace
path: /spec/replicas
value: 5
- target:
kind: HorizontalPodAutoscaler
name: payment
patch: |
- op: replace
path: /spec/maxReplicas
value: 20
images:
- name: payment-service
newTag: v2.3.1
configMapGenerator:
- name: app-config
literals:
- DATABASE_HOST=prod-db.internal
- LOG_LEVEL=warn
# Apply with kustomize (built into kubectl):
kubectl apply -k overlays/production/
# Preview what would be applied:
kubectl kustomize overlays/production/
# Directory structure:
# ├── base/
# │ ├── kustomization.yaml
# │ ├── deployment.yaml
# │ ├── service.yaml
# │ └── hpa.yaml
# ├── overlays/
# │ ├── dev/
# │ │ └── kustomization.yaml
# │ ├── staging/
# │ │ └── kustomization.yaml
# │ └── production/
# │ └── kustomization.yaml
Helm vs Kustomize
| Aspect | Helm | Kustomize |
|---|---|---|
| Approach | Templating (Go templates) | Patching (overlays on base) |
| Learning curve | Higher (template syntax) | Lower (plain YAML) |
| Third-party apps | Excellent (huge chart ecosystem) | Limited (patch others' YAML) |
| Release management | Built-in (versions, rollback) | None (rely on Git/GitOps) |
| Complexity | Can become complex (conditionals, loops) | Stays simple (pure overlays) |
| GitOps friendly | Requires HelmRelease CRD | Native (kubectl -k) |
| Best for | Distributing apps, complex configs | Internal apps, environment overrides |
CI/CD Pipelines
Pipeline Architecture
flowchart LR
subgraph CI [Continuous Integration]
A[Code Push] --> B[Build & Test]
B --> C[Image Build]
C --> D[Image Scan]
D --> E[Sign Image]
E --> F[Push to Registry]
end
subgraph CD [Continuous Delivery - GitOps]
F --> G[Update Manifest
in Git Repo]
G --> H[ArgoCD/Flux
Detects Change]
H --> I[Deploy to Staging]
I --> J[Integration Tests]
J --> K[Promote to Prod]
end
Progressive Delivery
# Argo Rollouts: canary deployment with automatic analysis
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: payment-service
spec:
replicas: 10
strategy:
canary:
steps:
- setWeight: 10 # 10% traffic to canary
- pause: {duration: 5m} # Wait 5 minutes
- analysis: # Run automated analysis
templates:
- templateName: success-rate
args:
- name: service-name
value: payment-service
- setWeight: 30 # 30% if analysis passes
- pause: {duration: 5m}
- setWeight: 60
- pause: {duration: 5m}
- setWeight: 100 # Full rollout
canaryMetadata:
labels:
role: canary
stableMetadata:
labels:
role: stable
---
# AnalysisTemplate: automated canary verification
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: success-rate
spec:
args:
- name: service-name
metrics:
- name: success-rate
interval: 1m
count: 5
successCondition: result[0] >= 0.99
provider:
prometheus:
address: http://prometheus.monitoring.svc:9090
query: |
sum(rate(http_requests_total{service="{{args.service-name}}",
status=~"2.."}[5m]))
/ sum(rate(http_requests_total{service="{{args.service-name}}"}[5m]))
Platform Engineering
Backstage (Developer Portal)
# Backstage Software Catalog (catalog-info.yaml):
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: payment-service
description: Processes payments and refunds
tags: ["go", "grpc", "production"]
annotations:
github.com/project-slug: mycompany/payment-service
argocd/app-name: payment-service
prometheus.io/rule: payment-alerts
pagerduty.com/service-id: P123ABC
links:
- url: https://grafana.internal/d/payment
title: Grafana Dashboard
- url: https://wiki.internal/payment
title: Documentation
spec:
type: service
lifecycle: production
owner: team-payments
system: checkout
dependsOn:
- resource:production-db
- component:notification-service
providesApis:
- payment-api
Crossplane (Infrastructure as Code)
# Crossplane: manage cloud infrastructure with Kubernetes CRDs
# "kubectl apply" to create AWS/Azure/GCP resources!
apiVersion: database.aws.crossplane.io/v1beta1
kind: RDSInstance
metadata:
name: production-db
spec:
forProvider:
region: us-east-1
dbInstanceClass: db.r6g.xlarge
engine: postgres
engineVersion: "16"
allocatedStorage: 100
masterUsername: admin
masterPasswordSecretRef:
name: db-master-password
namespace: crossplane-system
key: password
vpcSecurityGroupIds:
- sg-abc123
publiclyAccessible: false
writeConnectionSecretToRef:
name: production-db-conn
namespace: production
# Result: Crossplane provisions an actual RDS instance in AWS
# The connection string appears as a K8s Secret in your namespace
# Same declarative, reconciled pattern as everything else in K8s
Multi-Cluster Management
| Tool | Approach | Best For |
|---|---|---|
| ArgoCD (ApplicationSet) | GitOps agent per cluster, centralized config | Multi-cluster deployments via Git |
| Cluster API (CAPI) | Manage cluster lifecycle as K8s resources | Provisioning/upgrading clusters |
| Rancher | Centralized management UI | Multi-cloud cluster fleet management |
| Kubefed | Federated resources across clusters | Active-active multi-cluster (deprecated) |
| Liqo / Admiralty | Virtual nodes spanning clusters | Seamless multi-cluster scheduling |
# ArgoCD ApplicationSet: deploy to multiple clusters from one definition
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: payment-service
namespace: argocd
spec:
generators:
- clusters:
selector:
matchLabels:
env: production
template:
metadata:
name: 'payment-{{name}}'
spec:
project: production
source:
repoURL: https://github.com/mycompany/k8s-manifests.git
path: apps/payment-service/overlays/production
targetRevision: main
destination:
server: '{{server}}'
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
FinOps & Cost Optimization
# Kubernetes cost optimization strategies:
# 1. Right-size resources (VPA recommendations):
kubectl describe vpa --all-namespaces | grep -A5 "Target:"
# Most workloads over-request CPU by 3-5x
# 2. Kubecost / OpenCost: cost visibility per namespace/team
# Shows: cost per pod, namespace, label, cluster
# Identifies: idle resources, over-provisioned workloads
# 3. Spot/Preemptible instances for stateless workloads:
# AWS: Spot instances (60-90% cheaper)
# GCP: Preemptible VMs
# Azure: Spot VMs
# Use with tolerations + pod disruption budgets
# 4. Cluster Autoscaler + node auto-provisioner:
# Scale nodes down when utilization < 50%
# Mix instance types for bin-packing efficiency
# 5. Request-based billing alignment:
# Set requests = actual usage (not just limits)
# Unused requests = wasted money in multi-tenant clusters
# 6. Namespace cost allocation:
# Label all resources with team/project
# ResourceQuotas per team prevent runaway costs
# Chargeback: actual usage * per-unit cost → team invoice
Top Cost Optimization Actions
| Action | Typical Savings | Effort |
|---|---|---|
| Right-size CPU requests (VPA) | 20-40% | Low |
| Spot instances for stateless | 60-90% on those nodes | Medium |
| Scale dev/staging to 0 at night | 50% on non-prod | Low |
| Cluster Autoscaler tuning | 15-30% | Low |
| Reserved instances for baseline | 30-50% vs on-demand | Medium |
| Delete unused PVCs and LBs | 5-15% | Low |
CNCF Landscape
The Cloud Native Computing Foundation (CNCF) hosts the projects that form the cloud native ecosystem. Key graduated and incubating projects by category:
| Category | Graduated Projects | Incubating |
|---|---|---|
| Orchestration | Kubernetes | — |
| Runtime | containerd, CRI-O | WasmEdge, Kata Containers |
| Networking | Envoy, CoreDNS, Cilium | Istio, Gateway API |
| Storage | Rook | Longhorn, OpenEBS |
| Observability | Prometheus, Fluentd, Jaeger, OpenTelemetry | Thanos, Cortex |
| CI/CD | Argo, Flux | Tekton, Keptn |
| Security | OPA, Falco, TUF/Notary | cert-manager, Kyverno |
| Serverless | Knative | KEDA, Dapr |
| Package Mgmt | Helm | Artifact Hub |
Exercises
kubectl create namespace argocd && kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml). Create a Git repository with a simple Deployment manifest. Configure an Argo CD Application pointing to your repo. Push a change (update the image tag) and watch Argo CD sync it automatically. Verify with argocd app get.
helm create my-app. Edit values.yaml to customise replicas and image. Install it (helm install my-release ./my-app). Override values at install time (--set replicaCount=5). Upgrade the release with a new image tag. Rollback to the previous revision with helm rollback my-release 1.
dev/ (1 replica, latest image) and prod/ (3 replicas, pinned image tag, resource limits). Apply each with kubectl apply -k overlays/dev/ and kubectl apply -k overlays/prod/. Verify the differences with kubectl kustomize overlays/prod/.
kubectl set image to update the image, then scale the new ReplicaSet manually (simulating canary weight). Monitor error rates with kubectl logs. If healthy, complete the rollout; if not, roll back. Discuss: how would Argo Rollouts or Flagger automate this?
Tie the entire series together. Deploy a multi-tier application (frontend + API + database) managed entirely through Git:
- Create a Git repo with Kustomize base + prod overlay
- Frontend: Deployment + Service + Ingress (Parts 7, 9)
- API: Deployment + ConfigMap + Secret (Parts 7, 10)
- Database: StatefulSet + PVC + headless Service (Parts 7, 9, 10)
- Network Policy: restrict DB access to API only (Part 8, 14)
- Argo CD Application syncing from the repo (this part)
- Push a config change to Git and verify Argo CD detects and applies it
This single exercise validates: workload management, networking, storage, security, and GitOps — everything from Parts 6–16.
Series Conclusion
Over 16 parts, we've journeyed from the theoretical foundations of distributed systems to the practical realities of operating Kubernetes at scale. Here's what we covered:
| Phase | Parts | Core Lessons |
|---|---|---|
| Theory | 1–5 | CAP theorem, consensus (Raft/Paxos), replication, failure modes, resilience patterns |
| Kubernetes Core | 6–10 | Architecture, object model, networking, services/mesh, storage |
| Advanced K8s | 11–12 | Internals (informers, controllers, scheduler), CRDs & operators |
| Production | 13–16 | Operations, security, observability, cloud native ecosystem |
- Declarative over imperative — declare desired state, let controllers reconcile
- Immutable infrastructure — replace, don't patch
- Defence in depth — RBAC + network policies + pod security + admission control
- Observe everything — metrics, traces, logs (you can't fix what you can't see)
- Automate operations — GitOps, operators, autoscaling eliminate human error
- Design for failure — retries, circuit breakers, PDBs, multi-AZ, chaos engineering
- Start simple, evolve deliberately — don't adopt service mesh on day one
The distributed systems and Kubernetes landscape continues to evolve rapidly. The fundamentals covered in this series — consensus, fault tolerance, declarative reconciliation, defence in depth — remain constant even as tools change. Master the principles, and new tools become straightforward to adopt.