GitOps
Principles
GitOps Definition: GitOps is an operational framework where Git is the single source of truth for declarative infrastructure and applications. Changes to the desired state happen via pull requests, and an automated agent ensures the actual state matches the declared state.
The four principles of GitOps (OpenGitOps):
- Declarative: The entire system is described declaratively (YAML manifests)
- Versioned & Immutable: Desired state is stored in Git (versioned, auditable, rollback-able)
- Pulled Automatically: Agents pull desired state and apply it (no manual kubectl apply)
- Continuously Reconciled: Agents detect drift and self-heal to match Git
GitOps Workflow
flowchart LR
DEV[Developer] -->|Pull Request| GIT[Git Repository
Source of Truth]
GIT -->|Webhook/Poll| AGENT[GitOps Agent
ArgoCD / Flux]
AGENT -->|Reconcile| K8S[Kubernetes Cluster
Actual State]
K8S -->|Drift Detection| AGENT
AGENT -->|Status| GIT
ArgoCD
# ArgoCD Application: declare what to deploy and where
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: payment-service
namespace: argocd
spec:
project: production
source:
repoURL: https://github.com/mycompany/k8s-manifests.git
targetRevision: main
path: apps/payment-service/overlays/production
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true # Delete resources removed from Git
selfHeal: true # Auto-fix manual changes (drift)
syncOptions:
- CreateNamespace=true
retry:
limit: 3
backoff:
duration: 5s
factor: 2
maxDuration: 3m
# ArgoCD CLI:
argocd app list
# NAME CLUSTER NAMESPACE STATUS HEALTH SYNC
# payment-service in-cluster production Synced Healthy Auto
argocd app get payment-service
# Shows: sync status, health, last sync time, resources managed
argocd app sync payment-service # Force sync now
argocd app rollback payment-service # Rollback to previous version
argocd app diff payment-service # Show what would change
Flux
# Flux: lightweight GitOps with CRDs (no UI, pure K8s native)
# GitRepository: source definition
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: k8s-manifests
namespace: flux-system
spec:
interval: 1m
url: https://github.com/mycompany/k8s-manifests.git
ref:
branch: main
---
# Kustomization: what to deploy from the source
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: payment-service
namespace: flux-system
spec:
interval: 5m
path: ./apps/payment-service/overlays/production
prune: true
sourceRef:
kind: GitRepository
name: k8s-manifests
healthChecks:
- apiVersion: apps/v1
kind: Deployment
name: payment-service
namespace: production
Application Packaging
Helm Charts
# Helm: the package manager for Kubernetes
# Install a chart:
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install redis bitnami/redis \
--namespace cache --create-namespace \
--set auth.password=mySecretPass \
--set replica.replicaCount=3
# Upgrade:
helm upgrade redis bitnami/redis --set replica.replicaCount=5
# Rollback:
helm rollback redis 1 # Revision 1
# List releases:
helm list -A
# NAME NAMESPACE REVISION STATUS CHART APP VERSION
# redis cache 2 deployed redis-18.4.0 7.2.4
# Chart structure:
# my-chart/
# ├── Chart.yaml # Metadata (name, version, dependencies)
# ├── values.yaml # Default configuration values
# ├── templates/
# │ ├── deployment.yaml # Templated K8s manifest
# │ ├── service.yaml
# │ ├── ingress.yaml
# │ ├── _helpers.tpl # Template helpers
# │ └── NOTES.txt # Post-install message
# └── charts/ # Sub-chart dependencies
# Example template (templates/deployment.yaml):
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "my-chart.fullname" . }}
labels:
{{- include "my-chart.labels" . | nindent 4 }}
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
{{- include "my-chart.selectorLabels" . | nindent 6 }}
template:
spec:
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
ports:
- containerPort: {{ .Values.service.port }}
resources:
{{- toYaml .Values.resources | nindent 12 }}
Kustomize
# Kustomize: template-free configuration customization
# Base (shared across environments):
# base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- deployment.yaml
- service.yaml
- hpa.yaml
# Overlay for production:
# overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base
namePrefix: prod-
namespace: production
patches:
- target:
kind: Deployment
name: payment
patch: |
- op: replace
path: /spec/replicas
value: 5
- target:
kind: HorizontalPodAutoscaler
name: payment
patch: |
- op: replace
path: /spec/maxReplicas
value: 20
images:
- name: payment-service
newTag: v2.3.1
configMapGenerator:
- name: app-config
literals:
- DATABASE_HOST=prod-db.internal
- LOG_LEVEL=warn
# Apply with kustomize (built into kubectl):
kubectl apply -k overlays/production/
# Preview what would be applied:
kubectl kustomize overlays/production/
# Directory structure:
# ├── base/
# │ ├── kustomization.yaml
# │ ├── deployment.yaml
# │ ├── service.yaml
# │ └── hpa.yaml
# ├── overlays/
# │ ├── dev/
# │ │ └── kustomization.yaml
# │ ├── staging/
# │ │ └── kustomization.yaml
# │ └── production/
# │ └── kustomization.yaml
Helm vs Kustomize
| Aspect |
Helm |
Kustomize |
| Approach |
Templating (Go templates) |
Patching (overlays on base) |
| Learning curve |
Higher (template syntax) |
Lower (plain YAML) |
| Third-party apps |
Excellent (huge chart ecosystem) |
Limited (patch others' YAML) |
| Release management |
Built-in (versions, rollback) |
None (rely on Git/GitOps) |
| Complexity |
Can become complex (conditionals, loops) |
Stays simple (pure overlays) |
| GitOps friendly |
Requires HelmRelease CRD |
Native (kubectl -k) |
| Best for |
Distributing apps, complex configs |
Internal apps, environment overrides |
CI/CD Pipelines
Pipeline Architecture
Cloud Native CI/CD Pipeline
flowchart LR
subgraph CI [Continuous Integration]
A[Code Push] --> B[Build & Test]
B --> C[Image Build]
C --> D[Image Scan]
D --> E[Sign Image]
E --> F[Push to Registry]
end
subgraph CD [Continuous Delivery - GitOps]
F --> G[Update Manifest
in Git Repo]
G --> H[ArgoCD/Flux
Detects Change]
H --> I[Deploy to Staging]
I --> J[Integration Tests]
J --> K[Promote to Prod]
end
Progressive Delivery
# Argo Rollouts: canary deployment with automatic analysis
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: payment-service
spec:
replicas: 10
strategy:
canary:
steps:
- setWeight: 10 # 10% traffic to canary
- pause: {duration: 5m} # Wait 5 minutes
- analysis: # Run automated analysis
templates:
- templateName: success-rate
args:
- name: service-name
value: payment-service
- setWeight: 30 # 30% if analysis passes
- pause: {duration: 5m}
- setWeight: 60
- pause: {duration: 5m}
- setWeight: 100 # Full rollout
canaryMetadata:
labels:
role: canary
stableMetadata:
labels:
role: stable
---
# AnalysisTemplate: automated canary verification
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: success-rate
spec:
args:
- name: service-name
metrics:
- name: success-rate
interval: 1m
count: 5
successCondition: result[0] >= 0.99
provider:
prometheus:
address: http://prometheus.monitoring.svc:9090
query: |
sum(rate(http_requests_total{service="{{args.service-name}}",
status=~"2.."}[5m]))
/ sum(rate(http_requests_total{service="{{args.service-name}}"}[5m]))
Backstage (Developer Portal)
Platform Engineering: The discipline of building Internal Developer Platforms (IDPs) that make developers self-service. Instead of tickets to ops ("please create me a database"), developers use a catalog/portal to provision what they need — with guardrails and compliance built in.
# Backstage Software Catalog (catalog-info.yaml):
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: payment-service
description: Processes payments and refunds
tags: ["go", "grpc", "production"]
annotations:
github.com/project-slug: mycompany/payment-service
argocd/app-name: payment-service
prometheus.io/rule: payment-alerts
pagerduty.com/service-id: P123ABC
links:
- url: https://grafana.internal/d/payment
title: Grafana Dashboard
- url: https://wiki.internal/payment
title: Documentation
spec:
type: service
lifecycle: production
owner: team-payments
system: checkout
dependsOn:
- resource:production-db
- component:notification-service
providesApis:
- payment-api
Crossplane (Infrastructure as Code)
# Crossplane: manage cloud infrastructure with Kubernetes CRDs
# "kubectl apply" to create AWS/Azure/GCP resources!
apiVersion: database.aws.crossplane.io/v1beta1
kind: RDSInstance
metadata:
name: production-db
spec:
forProvider:
region: us-east-1
dbInstanceClass: db.r6g.xlarge
engine: postgres
engineVersion: "16"
allocatedStorage: 100
masterUsername: admin
masterPasswordSecretRef:
name: db-master-password
namespace: crossplane-system
key: password
vpcSecurityGroupIds:
- sg-abc123
publiclyAccessible: false
writeConnectionSecretToRef:
name: production-db-conn
namespace: production
# Result: Crossplane provisions an actual RDS instance in AWS
# The connection string appears as a K8s Secret in your namespace
# Same declarative, reconciled pattern as everything else in K8s
Multi-Cluster Management
| Tool |
Approach |
Best For |
| ArgoCD (ApplicationSet) |
GitOps agent per cluster, centralized config |
Multi-cluster deployments via Git |
| Cluster API (CAPI) |
Manage cluster lifecycle as K8s resources |
Provisioning/upgrading clusters |
| Rancher |
Centralized management UI |
Multi-cloud cluster fleet management |
| Kubefed |
Federated resources across clusters |
Active-active multi-cluster (deprecated) |
| Liqo / Admiralty |
Virtual nodes spanning clusters |
Seamless multi-cluster scheduling |
# ArgoCD ApplicationSet: deploy to multiple clusters from one definition
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: payment-service
namespace: argocd
spec:
generators:
- clusters:
selector:
matchLabels:
env: production
template:
metadata:
name: 'payment-{{name}}'
spec:
project: production
source:
repoURL: https://github.com/mycompany/k8s-manifests.git
path: apps/payment-service/overlays/production
targetRevision: main
destination:
server: '{{server}}'
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
FinOps & Cost Optimization
# Kubernetes cost optimization strategies:
# 1. Right-size resources (VPA recommendations):
kubectl describe vpa --all-namespaces | grep -A5 "Target:"
# Most workloads over-request CPU by 3-5x
# 2. Kubecost / OpenCost: cost visibility per namespace/team
# Shows: cost per pod, namespace, label, cluster
# Identifies: idle resources, over-provisioned workloads
# 3. Spot/Preemptible instances for stateless workloads:
# AWS: Spot instances (60-90% cheaper)
# GCP: Preemptible VMs
# Azure: Spot VMs
# Use with tolerations + pod disruption budgets
# 4. Cluster Autoscaler + node auto-provisioner:
# Scale nodes down when utilization < 50%
# Mix instance types for bin-packing efficiency
# 5. Request-based billing alignment:
# Set requests = actual usage (not just limits)
# Unused requests = wasted money in multi-tenant clusters
# 6. Namespace cost allocation:
# Label all resources with team/project
# ResourceQuotas per team prevent runaway costs
# Chargeback: actual usage * per-unit cost → team invoice
Strategy
FinOps Quick Wins
Top Cost Optimization Actions
| Action | Typical Savings | Effort |
| Right-size CPU requests (VPA) | 20-40% | Low |
| Spot instances for stateless | 60-90% on those nodes | Medium |
| Scale dev/staging to 0 at night | 50% on non-prod | Low |
| Cluster Autoscaler tuning | 15-30% | Low |
| Reserved instances for baseline | 30-50% vs on-demand | Medium |
| Delete unused PVCs and LBs | 5-15% | Low |
Cost Reduction
FinOps
Efficiency
CNCF Landscape
The Cloud Native Computing Foundation (CNCF) hosts the projects that form the cloud native ecosystem. Key graduated and incubating projects by category:
| Category |
Graduated Projects |
Incubating |
| Orchestration |
Kubernetes |
— |
| Runtime |
containerd, CRI-O |
WasmEdge, Kata Containers |
| Networking |
Envoy, CoreDNS, Cilium |
Istio, Gateway API |
| Storage |
Rook |
Longhorn, OpenEBS |
| Observability |
Prometheus, Fluentd, Jaeger, OpenTelemetry |
Thanos, Cortex |
| CI/CD |
Argo, Flux |
Tekton, Keptn |
| Security |
OPA, Falco, TUF/Notary |
cert-manager, Kyverno |
| Serverless |
Knative |
KEDA, Dapr |
| Package Mgmt |
Helm |
Artifact Hub |
Series Conclusion
Over 16 parts, we've journeyed from the theoretical foundations of distributed systems to the practical realities of operating Kubernetes at scale. Here's what we covered:
| Phase |
Parts |
Core Lessons |
| Theory |
1–5 |
CAP theorem, consensus (Raft/Paxos), replication, failure modes, resilience patterns |
| Kubernetes Core |
6–10 |
Architecture, object model, networking, services/mesh, storage |
| Advanced K8s |
11–12 |
Internals (informers, controllers, scheduler), CRDs & operators |
| Production |
13–16 |
Operations, security, observability, cloud native ecosystem |
Key Principles to Remember:
- Declarative over imperative — declare desired state, let controllers reconcile
- Immutable infrastructure — replace, don't patch
- Defence in depth — RBAC + network policies + pod security + admission control
- Observe everything — metrics, traces, logs (you can't fix what you can't see)
- Automate operations — GitOps, operators, autoscaling eliminate human error
- Design for failure — retries, circuit breakers, PDBs, multi-AZ, chaos engineering
- Start simple, evolve deliberately — don't adopt service mesh on day one
The distributed systems and Kubernetes landscape continues to evolve rapidly. The fundamentals covered in this series — consensus, fault tolerance, declarative reconciliation, defence in depth — remain constant even as tools change. Master the principles, and new tools become straightforward to adopt.
Related Articles in This Series
Part 1: Foundations of Distributed Systems
Where it all began — the fundamental problems and properties of distributed systems.
Read Article
Part 6: Kubernetes Architecture
The control plane, worker nodes, and components that make Kubernetes work.
Read Article
Part 12: CRDs & Operators
Extending Kubernetes — the pattern that powers the entire ecosystem.
Read Article