This article takes a build-first approach: we'll set up a working Kubernetes cluster, then explore how each component works. By having a live environment, you can verify every concept hands-on as you read. The complete "Hello Kubernetes" exercise that demonstrates all features together is in Part 7, which pairs theory with practice for each object type.
Cluster Setup & Installation
kubeadm
kubeadm is the standard tool for bootstrapping Kubernetes clusters. It handles the complex process of generating certificates, configuring etcd, starting control plane components, and creating the token for worker nodes to join.
# Bootstrap a Kubernetes cluster with kubeadm:
# STEP 1: Install prerequisites (run on ALL nodes — master + workers)
# 1a. Disable swap (required by kubelet)
sudo swapoff -a
sudo sed -i '/ swap / s/^/#/' /etc/fstab
# 1b. Load required kernel modules
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
# 1c. Set required sysctl parameters
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system
# 1d. Install containerd (container runtime)
sudo apt-get update
sudo apt-get install -y containerd
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
# Enable SystemdCgroup (required for kubelet)
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
sudo systemctl restart containerd
sudo systemctl enable containerd
# 1e. Install kubeadm, kubelet, kubectl
sudo apt-get install -y apt-transport-https ca-certificates curl gpg
sudo mkdir -p /etc/apt/keyrings
# Add Kubernetes signing key + apt repository (v1.30)
# Method A — Import from Release.key URL (works when key matches repo):
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.30/deb/Release.key | \
sudo gpg --dearmor --yes -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
# Method B — If Method A fails with "NO_PUBKEY" error during apt-get update,
# the repo signing key has rotated. Import directly from a keyserver instead:
# sudo rm -f /etc/apt/keyrings/kubernetes-apt-keyring.gpg
# sudo gpg --no-default-keyring \
# --keyring gnupg-ring:/etc/apt/keyrings/kubernetes-apt-keyring.gpg \
# --keyserver hkps://keyserver.ubuntu.com \
# --recv-keys 234654DA9A296436
# sudo chmod 644 /etc/apt/keyrings/kubernetes-apt-keyring.gpg
# sudo rm -f /etc/apt/keyrings/kubernetes-apt-keyring.gpg~
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.30/deb/ /' | \
sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
apt-get update fails with NO_PUBKEY 234654DA9A296436, it means the signing key in the Release.key URL is stale (Kubernetes periodically rotates keys). The fix is to use Method B above — import the key directly from Ubuntu's keyserver using the gnupg-ring: prefix. This creates the keyring in the legacy format that apt can read. The chmod 644 is required because GPG creates files with 600 permissions but apt needs read access.
# STEP 2: Initialise the control plane (master node only)
#
# Option A — Single-node / learning cluster (simplest):
sudo kubeadm init \
--pod-network-cidr=10.244.0.0/16
# Option B — Multi-master HA cluster (production):
# --control-plane-endpoint should point to a load balancer FQDN
# or a DNS name that resolves to your API server(s).
# For a single-node setup, use your machine's IP or hostname instead:
# --control-plane-endpoint="$(hostname -I | awk '{print $1}'):6443"
sudo kubeadm init \
--pod-network-cidr=10.244.0.0/16 \
--control-plane-endpoint="k8s-api.example.com:6443" \
--upload-certs
hostname -f fails with "Name or service not known", add your hostname to /etc/hosts first: echo "127.0.0.1 $(hostname)" >> /etc/hosts. kubeadm needs the node's hostname to resolve. The --control-plane-endpoint flag is only needed for HA setups where multiple control planes share a load balancer. For single-node clusters, omit it entirely.
# Output from kubeadm init (annotated):
# [init] Using Kubernetes version: v1.30.14
# [preflight] Running pre-flight checks
# [preflight] Pulling images required for setting up a Kubernetes cluster
# [certs] Generating "ca" certificate and key
# [certs] Generating "apiserver" certificate and key
# [certs] apiserver serving cert is signed for DNS names
# [kubernetes kubernetes.default kubernetes.default.svc
# kubernetes.default.svc.cluster.local your-hostname]
# and IPs [10.96.0.1 <your-node-ip>]
# [certs] Generating etcd/ca, etcd/server, etcd/peer, front-proxy certs...
# [kubeconfig] Writing admin.conf, kubelet.conf, controller-manager.conf, scheduler.conf
# [etcd] Creating static Pod manifest for local etcd
# [control-plane] Creating static Pod manifests for apiserver, controller-manager, scheduler
# [kubelet-start] Starting the kubelet
# [kubelet-check] The kubelet is healthy after ~1s
# [api-check] The API server is healthy after ~6s
# [upload-config] Storing configuration in ConfigMap "kubeadm-config"
# [upload-certs] Storing certificates in Secret "kubeadm-certs"
# [mark-control-plane] Adding labels and taints to control-plane node
# [bootstrap-token] Creating token for node joins
# [addons] Applied essential addon: CoreDNS
# [addons] Applied essential addon: kube-proxy
#
# ✅ Your Kubernetes control-plane has initialized successfully!
# STEP 3: Configure kubectl (master node)
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
# (Or if root: export KUBECONFIG=/etc/kubernetes/admin.conf)
# STEP 4: Install CNI plugin (networking)
# Without a CNI, pods can't communicate and nodes stay NotReady
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.27.0/manifests/calico.yaml
# STEP 5: Join worker nodes
# kubeadm init outputs a join command — run it on each worker:
sudo kubeadm join your-master:6443 \
--token <token-from-init-output> \
--discovery-token-ca-cert-hash sha256:<hash-from-init-output>
# For additional control-plane nodes (HA), add --control-plane --certificate-key:
# sudo kubeadm join your-master:6443 --token <token> \
# --discovery-token-ca-cert-hash sha256:<hash> \
# --control-plane --certificate-key <cert-key-from-init-output>
# If you lost the join command, regenerate the token:
kubeadm token create --print-join-command
# Output from kubeadm join (run on each worker node):
# [preflight] Running pre-flight checks
# [preflight] Reading configuration from the cluster...
# [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
# [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
# [kubelet-start] Starting the kubelet
# [kubelet-check] The kubelet is healthy after ~500ms
# [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap
#
# ✅ This node has joined the cluster:
# * Certificate signing request was sent to apiserver and a response was received.
# * The Kubelet was informed of the new secure connection details.
# Verify cluster (run on master):
kubectl get nodes
# NAME STATUS ROLES AGE VERSION
# master-1 Ready control-plane 55m v1.30.14
# worker-1 Ready <none> 2m v1.30.14
# worker-2 NotReady <none> 30s v1.30.14
#
# Note: Workers show "NotReady" for 30-90 seconds while the CNI
# plugin initialises networking. This is normal — wait and re-check:
kubectl get nodes
# NAME STATUS ROLES AGE VERSION
# master-1 Ready control-plane 55m v1.30.14
# worker-1 Ready <none> 2m v1.30.14
# worker-2 Ready <none> 97s v1.30.14
/etc/kubernetes/manifests/ for etcd + apiserver + controller-manager + scheduler, starts kubelet which launches those static pods, waits for the API server to be healthy, uploads config to ConfigMaps, creates bootstrap tokens, and installs CoreDNS + kube-proxy addons. All in ~10 seconds.
Single-Node Alternative
For a single-node learning cluster, remove the control-plane taint so pods can schedule on the master:
# SINGLE-NODE SETUP: Allow pods to run on the control-plane node
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
# Verify the taint is gone:
kubectl describe node | grep Taints
# Taints: <none>
Managed vs Self-Managed Kubernetes
| Aspect | Managed (EKS/GKE/AKS) | Self-Managed (kubeadm/k3s) |
|---|---|---|
| Control plane | Provider manages (HA, upgrades, etcd) | You manage everything |
| etcd | Hidden, auto-backed up | You must back up and maintain |
| Upgrades | One-click or automatic | Manual (kubeadm upgrade) |
| Networking | Pre-integrated CNI | Install and configure CNI yourself |
| Cost | $70–$200/month for control plane | $0 (just node costs) |
| Best for | Production workloads, teams without deep K8s ops expertise | Learning, edge/IoT, air-gapped, extreme customisation |
K3s, MicroK8s, and Kind
For development and edge computing, lightweight distributions strip down Kubernetes:
- K3s (Rancher): Single binary (~60MB), replaces etcd with SQLite, ideal for edge/IoT and development. Production-ready for small clusters.
- MicroK8s (Canonical): Snap-based, single-node or multi-node, built-in addons (Istio, Prometheus). Great for developers on Ubuntu.
- Kind (Kubernetes in Docker): Runs cluster nodes as Docker containers. Perfect for CI/CD testing and local development. Not for production.
- Minikube: Single-node cluster in a VM or container. Focused on local development with easy addon management.
The Kubernetes Mental Model
Now that you have a running cluster, let's explore the mental model that makes Kubernetes tick. Start by looking at what's already running:
# See every component Kubernetes installed automatically:
kubectl get pods -n kube-system
# NAME READY STATUS AGE
# calico-kube-controllers-... 1/1 Running ...
# calico-node-... 1/1 Running ...
# coredns-... 1/1 Running ...
# etcd-master-1 1/1 Running ...
# kube-apiserver-master-1 1/1 Running ...
# kube-controller-manager-master-1 1/1 Running ...
# kube-proxy-... 1/1 Running ...
# kube-scheduler-master-1 1/1 Running ...
# Every component you see here is explained in the sections below.
Declarative Reconciliation
Every concept from Parts 1–5 — consensus, replication, service discovery, resilience — converges in Kubernetes. But Kubernetes adds one powerful abstraction that makes it all manageable: declarative reconciliation.
flowchart TD
A[User Submits Desired State] --> B[API Server Stores in etcd]
B --> C[Controllers Watch for Changes]
C --> D{Desired == Actual?}
D -->|Yes| E[No action needed]
D -->|No| F[Controller takes corrective action]
F --> G[Actual state moves toward desired]
G --> C
E --> C
This model is fundamentally different from traditional infrastructure management:
| Aspect | Imperative (Traditional) | Declarative (Kubernetes) |
|---|---|---|
| Instructions | "Create VM, install nginx, start service" | "3 nginx pods should be running" |
| Failure handling | Manual detection, manual fix | Auto-detected, auto-fixed |
| Drift | Accumulates silently | Continuously reconciled |
| Scaling | "Create 2 more VMs and configure them" | "Change replicas from 3 to 5" |
| State tracking | CMDB (often outdated) | etcd is single source of truth |
Desired State vs Actual State
This is the single most important concept in Kubernetes. Everything else flows from it:
apiVersion+kind— what type of resource (e.g., a Deployment)metadata— the resource's name and labelsspec— your desired state (how many replicas, which container image, etc.)
kubectl apply -f filename.yaml. Kubernetes reads it, stores it in etcd, and works to make reality match your declaration.
# This YAML is a declaration of DESIRED STATE:
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-server
spec:
replicas: 3 # "I want 3 pods running at all times"
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: nginx
image: nginx:1.25
ports:
- containerPort: 80
resources:
requests:
memory: "64Mi"
cpu: "100m"
limits:
memory: "128Mi"
cpu: "200m"
# When you apply this:
# 1. API Server stores it in etcd
# 2. Deployment controller sees: desired=3, actual=0
# 3. Creates a ReplicaSet
# 4. ReplicaSet controller sees: desired=3, actual=0
# 5. Creates 3 Pod objects
# 6. Scheduler sees 3 unscheduled pods
# 7. Assigns each to a node
# 8. kubelet on each node starts the container
# If a pod crashes:
# 1. ReplicaSet controller sees: desired=3, actual=2
# 2. Creates 1 new Pod
# 3. Scheduler assigns it
# 4. kubelet starts it
# Total recovery time: ~5-15 seconds
# Don't worry about writing YAML yet — Part 7 teaches every field.
# For now, focus on the PATTERN: declare what you want → Kubernetes makes it happen.
Control Plane Components
The control plane is the "brain" of the cluster. It makes global decisions about scheduling, detects failures, and maintains desired state. In production, control plane components run on dedicated nodes (often 3 or 5 for high availability).
flowchart LR
subgraph CP[Control Plane]
direction TB
ETCD[(etcd
State Store)]
API[API Server
Central Hub]
SCHED[Scheduler
Pod Placement]
CM[Controller Manager
Reconciliation Loops]
CCM[Cloud Controller
Provider Integration]
API <-->|read/write state| ETCD
SCHED -->|watch unscheduled pods| API
CM -->|watch & reconcile| API
CCM -->|cloud resources| API
end
subgraph W1[Worker Node 1]
direction TB
K1[kubelet
Pod Lifecycle]
KP1[kube-proxy
Network Rules]
end
subgraph W2[Worker Node 2]
direction TB
K2[kubelet
Pod Lifecycle]
KP2[kube-proxy
Network Rules]
end
K1 -->|report status & watch pods| API
KP1 -->|watch Services & Endpoints| API
K2 -->|report status & watch pods| API
KP2 -->|watch Services & Endpoints| API
API Server (kube-apiserver)
The API Server is the front door to everything in Kubernetes. Every component — kubectl, controllers, kubelets, external tools — communicates exclusively through the API Server. Nothing talks to etcd directly except the API Server.
# The API Server exposes a RESTful API over HTTPS:
# Every Kubernetes operation is an API call
# List pods (GET request to /api/v1/namespaces/default/pods)
kubectl get pods
# Equivalent: curl -k https://api-server:6443/api/v1/namespaces/default/pods
# Create a pod (POST request)
kubectl apply -f pod.yaml
# Equivalent: curl -X POST -d @pod.yaml https://api-server:6443/api/v1/namespaces/default/pods
# Watch for changes (long-lived HTTP connection with chunked responses)
kubectl get pods --watch
# Equivalent: curl https://api-server:6443/api/v1/namespaces/default/pods?watch=true
# Explore the API directly:
kubectl api-resources # List all resource types
kubectl api-versions # List all API versions
kubectl explain deployment # Show schema for a resource type
kubectl explain deployment.spec.template.spec.containers
# ╔══════════════════════════════════════════════════════════╗
# ║ 🔧 TRY IT: Run these on your cluster right now: ║
# ║ kubectl api-resources | head -20 ║
# ║ kubectl get pods -n kube-system -l component=kube-apiserver ║
# ╚══════════════════════════════════════════════════════════╝
Key responsibilities of the API Server:
- Authentication: Verifies identity (certificates, tokens, OIDC)
- Authorisation: Checks permissions (RBAC — can this user create pods?)
- Admission Control: Validates and mutates requests (resource quotas, default values, policy enforcement)
- Persistence: Stores validated objects in etcd
- Watch notifications: Notifies controllers of state changes
flowchart TD
A["`**Client Request**
kubectl apply -f deploy.yaml`"] --> B
subgraph AUTH[Identity & Access]
direction LR
B[Authentication
Who are you?] --> C[Authorization
Are you allowed?]
end
C --> D
subgraph ADM[Admission Control]
direction LR
D[Mutating Webhooks
Modify the request] --> E[Schema Validation
Is it well-formed?] --> F[Validating Webhooks
Policy checks]
end
F --> G[Persist to etcd]
G --> H["`**Response to Client**
201 Created`"]
etcd
etcd is the distributed key-value store that holds all cluster state. It's the single source of truth — if etcd is lost and unrecoverable, the cluster is gone. It uses the Raft consensus algorithm (which we studied in Part 2) to maintain consistency across replicas.
# What etcd stores (all Kubernetes objects as key-value pairs):
# Key: /registry/pods/default/my-pod
# Value: JSON-encoded Pod object
# ╔══════════════════════════════════════════════════════════╗
# ║ 🔧 TRY IT: Find YOUR etcd endpoints and certs first: ║
# ╚══════════════════════════════════════════════════════════╝
# Step 1: Get your etcd pod name (it varies per cluster!):
kubectl get pods -n kube-system -l component=etcd
# NAME READY STATUS AGE
# etcd-k8s-master.lab.example.com 1/1 Running 10h
# Step 2: Store the pod name in a variable for reuse:
ETCD_POD=$(kubectl get pods -n kube-system -l component=etcd -o jsonpath='{.items[0].metadata.name}')
echo $ETCD_POD
# etcd-k8s-master.lab.example.com
# Step 3: Extract the advertise URL (your actual etcd endpoint):
kubectl describe pod $ETCD_POD -n kube-system | grep -- --advertise-client-urls
# --advertise-client-urls=https://10.42.38.10:2379
# Step 4: Use kubectl exec to run etcdctl INSIDE the etcd pod
# (certs are already mounted inside — no host path issues):
kubectl exec -n kube-system $ETCD_POD -- etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
endpoint health
# Output:
# https://127.0.0.1:2379 is healthy: successfully committed proposal: took = 11.16ms
# Member list (shows all etcd nodes in an HA cluster):
kubectl exec -n kube-system $ETCD_POD -- etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
member list --write-out=table
# Single-node cluster output:
# +------------------+---------+------------------------------+--------------------------+--------------------------+------------+
# | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
# +------------------+---------+------------------------------+--------------------------+--------------------------+------------+
# | e179b74d7e9a5155 | started | k8s-master.lab.example.com | https://10.42.38.10:2380 | https://10.42.38.10:2379 | false |
# +------------------+---------+------------------------------+--------------------------+------------+
#
# Multi-node HA cluster output (3 voting members):
# | 8e9e05c52164694d | started | master-1 | https://192.168.1.100:2380 | https://192.168.1.100:2379 | false |
# | 91bc3c398fb3c146 | started | master-2 | https://192.168.1.101:2380 | https://192.168.1.101:2379 | false |
# | fd422379fda50e48 | started | master-3 | https://192.168.1.102:2380 | https://192.168.1.102:2379 | false |
#
# IS LEARNER column:
# false = Full voting member (participates in Raft quorum)
# true = Learner/non-voting member (receives log replication but
# does NOT vote in elections or count toward quorum).
# Used when adding a new node — it catches up on data first,
# then gets promoted to voter with: etcdctl member promote
# This prevents a slow new node from disrupting cluster consensus.
# Backup etcd (CRITICAL for disaster recovery):
kubectl exec -n kube-system $ETCD_POD -- etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
snapshot save /var/lib/etcd/snapshot.db
# Copy the snapshot to your local machine:
kubectl cp kube-system/$ETCD_POD:/var/lib/etcd/snapshot.db ./etcd-backup.db
| etcd Property | Value | Why It Matters |
|---|---|---|
| Consensus | Raft (leader-based) | Strong consistency, linearizable reads |
| Quorum (3 nodes) | 2 of 3 must agree | Survives 1 node failure |
| Quorum (5 nodes) | 3 of 5 must agree | Survives 2 node failures |
| Storage limit | 8 GB default | Compaction required to reclaim space |
| Disk requirement | <10ms fsync | Slow disk = slow cluster |
Scheduler (kube-scheduler)
The Scheduler watches for newly created Pods that have no node assigned and selects a suitable node for each one. It doesn't run the pod — it just decides where it should go.
flowchart LR
A[New Pod
No Node Assigned] --> B[Filtering Phase
Which nodes CAN run it?]
B --> C[Scoring Phase
Which node is BEST?]
C --> D[Binding
Assign pod to winner]
B -->|Excludes| E[Insufficient CPU/Memory]
B -->|Excludes| F[Taints not tolerated]
B -->|Excludes| G[Affinity violated]
The scheduler works in three phases every time it sees an unassigned pod:
1. Filtering — Which nodes can run this pod?
Eliminates nodes that violate hard constraints:
- Resources: Does the node have enough free CPU/memory for the pod's
requests? - Ports: Is the required
hostPortalready taken? - Node selectors: Does the node match
nodeSelectorornodeAffinityrules? - Taints: Does the pod tolerate the node's taints (e.g.,
NoSchedule)? - Volumes: Is the required PersistentVolume available in the node's zone?
2. Scoring — Which surviving node is best?
Ranks each remaining node 0–100 using scoring plugins:
- LeastRequestedPriority: Prefer nodes with the most free resources
- BalancedResourceAllocation: Prefer nodes where CPU and memory usage are balanced
- InterPodAffinity: Prefer nodes already running co-located pods
- ImageLocality: Prefer nodes that already pulled the container image (faster start)
- TopologySpreadConstraints: Spread pods evenly across zones/nodes
3. Binding — Assign the pod to the winner
The scheduler updates Pod.spec.nodeName in the API Server. The kubelet on that node detects the assignment and starts the container.
# See why a pod isn't scheduling:
kubectl describe pod stuck-pod
# Events:
# Warning FailedScheduling 0/5 nodes available:
# 2 Insufficient memory, 3 node(s) had taint {dedicated: gpu}
# See successful scheduler decisions:
kubectl get events --field-selector reason=Scheduled
# Successfully assigned default/web-pod to worker-node-2
Controller Manager (kube-controller-manager)
The Controller Manager runs dozens of independent reconciliation loops (controllers), each responsible for a specific resource type. Each controller watches the API Server for changes and takes action to align actual state with desired state.
| Controller | Watches | Reconciles |
|---|---|---|
| Deployment | Deployment objects | Creates/updates ReplicaSets for rollouts |
| ReplicaSet | ReplicaSets + Pods | Maintains desired pod count |
| Node | Node heartbeats | Marks unresponsive nodes NotReady |
| Job | Job objects | Ensures pods run to completion |
| Endpoints | Services + Pods | Updates Service endpoints when pods change |
| Namespace | Namespace deletions | Cleans up all resources in deleted namespaces |
| ServiceAccount | Namespace creation | Creates default ServiceAccount per namespace |
# All controllers run as goroutines within a single binary:
# kube-controller-manager
# See which controllers are active:
kubectl get componentstatuses
# NAME STATUS MESSAGE
# controller-manager Healthy ok
# scheduler Healthy ok
# etcd-0 Healthy {"health":"true","reason":""}
# Controller Manager flags (key configuration):
# --controllers=* # Enable all controllers
# --concurrent-deployment-syncs=5 # Parallel deployment reconciliations
# --node-monitor-grace-period=40s # Time before marking node NotReady
# --pod-eviction-timeout=5m0s # Time before evicting pods from NotReady node
# --cluster-cidr=10.244.0.0/16 # Pod network range
Cloud Controller Manager
The Cloud Controller Manager connects Kubernetes to the underlying cloud provider (AWS, GCP, Azure). It handles cloud-specific operations that Kubernetes itself doesn't need to know about:
- Node Controller: Detects when cloud VMs are deleted, updates node status
- Route Controller: Configures cloud network routes for pod communication
- Service Controller: Creates cloud load balancers for LoadBalancer-type Services
Worker Node Components
Worker nodes are the machines that actually run your application containers. Each worker node runs three core components:
kubelet
The kubelet is the agent on every worker node. It receives pod specifications from the API Server and ensures the described containers are running and healthy. It's the component that actually makes things happen on the physical machine.
# kubelet responsibilities:
# 1. Register the node with the API Server
# 2. Watch API Server for pods assigned to this node
# 3. Pull container images
# 4. Start/stop containers via container runtime (CRI)
# 5. Execute liveness/readiness/startup probes
# 6. Report pod status back to API Server
# 7. Manage volumes (mount/unmount)
# 8. Send node heartbeats (NodeLease)
# Check kubelet status on a node:
systemctl status kubelet
# ● kubelet.service - kubelet: The Kubernetes Node Agent
# Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; preset: enabled)
# Drop-In: /usr/lib/systemd/system/kubelet.service.d
# └─10-kubeadm.conf
# Active: active (running) since Sun 2026-06-07 14:27:41 UTC; 1 week 0 days ago
# Docs: https://kubernetes.io/docs/
# Main PID: 6416 (kubelet)
# Tasks: 14 (limit: 19093)
# Memory: 35.4M (peak: 37.1M)
# CPU: 3h 12min 14.748s
# CGroup: /system.slice/kubelet.service
# └─6416 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/boot...
#
# Jun 14 22:24:46 k8s-worker-1.lab.example.com kubelet[6416]: I0614 22:24:46.9...
# Jun 14 22:24:47 k8s-worker-1.lab.example.com kubelet[6416]: I0614 22:24:47.1...
# kubelet logs (live tail):
journalctl -u kubelet -f --no-pager | tail -20
# Key kubelet configuration:
# --pod-manifest-path=/etc/kubernetes/manifests # Static pods
# --cluster-dns=10.96.0.10 # CoreDNS service IP
# --max-pods=110 # Max pods per node
# --node-status-update-frequency=10s # Heartbeat interval
# --eviction-hard=memory.available<100Mi # Eviction thresholds
# Static pods (managed directly by kubelet, not API Server):
# ON A MASTER NODE:
ls /etc/kubernetes/manifests/
# etcd.yaml
# kube-apiserver.yaml
# kube-controller-manager.yaml
# kube-scheduler.yaml
# These are how control plane components run on master nodes!
# ON A WORKER NODE:
ls /etc/kubernetes/manifests/
# (empty — workers have no static pods, only kubelet + kube-proxy)
# ╔══════════════════════════════════════════════════════════╗
# ║ 🔧 TRY IT: SSH into a worker node and run: ║
# ║ systemctl status kubelet ║
# ║ ls /etc/kubernetes/manifests/ (empty on workers!) ║
# ║ crictl ps (running containers) ║
# ╚══════════════════════════════════════════════════════════╝
kube-proxy
kube-proxy maintains network rules on each node that allow pods to communicate with Services. It implements the Service abstraction — translating a virtual ClusterIP into actual pod IPs.
# kube-proxy modes:
# 1. iptables mode (default):
# Creates iptables rules for each Service → endpoint mapping
# Packets are redirected at kernel level (very fast, no userspace)
iptables -t nat -L KUBE-SERVICES | head -20
# Chain KUBE-SERVICES (2 references)
# target prot opt source destination
# KUBE-SVC-NPX46M4PTMTKRN6Y tcp -- anywhere 10.96.0.1 /* default/kubernetes:https cluster IP */ tcp dpt:https
# KUBE-SVC-ERIFXISQEP7F7OF4 tcp -- anywhere 10.96.0.10 /* kube-system/kube-dns:dns-tcp cluster IP */ tcp dpt:domain
# KUBE-SVC-JD5MR3NA4I4DYORP tcp -- anywhere 10.96.0.10 /* kube-system/kube-dns:metrics cluster IP */ tcp dpt:9153
# KUBE-SVC-TCOU7JCQXEZGVUNU udp -- anywhere 10.96.0.10 /* kube-system/kube-dns:dns cluster IP */ udp dpt:domain
# KUBE-NODEPORTS all -- anywhere anywhere /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL
#
# Each KUBE-SVC-* chain contains the actual load-balancing rules
# that distribute traffic to pod endpoints.
# 2. IPVS mode (better for large clusters):
# Uses Linux IPVS (IP Virtual Server) for load balancing
# Supports more algorithms: round-robin, least-connection, weighted
# Better performance at 10,000+ services
#
# Note: If your cluster uses iptables mode (the default), ipvsadm
# will show empty output — this is normal:
ipvsadm -Ln
# IP Virtual Server version 1.2.1 (size=4096)
# Prot LocalAddress:Port Scheduler Flags
# -> RemoteAddress:Port Forward Weight ActiveConn InActConn
# (empty — cluster is using iptables mode, not IPVS)
#
# With IPVS mode enabled, you'd see:
# TCP 10.96.0.1:443 rr
# -> 10.244.1.15:6443 Masq 1 0 0
# TCP 10.96.0.10:53 rr
# -> 10.244.0.2:53 Masq 1 0 0
# -> 10.244.0.3:53 Masq 1 0 0
# Check kube-proxy mode (run from master):
kubectl get configmap kube-proxy -n kube-system -o yaml | grep mode
# mode: ""
#
# mode: "" (empty string) means kube-proxy uses the DEFAULT mode,
# which is iptables. Possible values:
# "" → iptables (default since Kubernetes 1.2)
# "iptables" → explicit iptables mode
# "ipvs" → IPVS mode (better for 1000+ services)
# "nftables" → nftables mode (alpha in 1.29+, replaces iptables)
#
# Key fields in the kube-proxy ConfigMap (config.conf):
# clusterCIDR: 10.244.0.0/16 # Pod network range
# iptables.syncPeriod: 0s # How often rules are refreshed (0 = default 30s)
# ipvs.scheduler: "" # Load balancing algorithm (rr, lc, wrr, etc.)
# ipvs.strictARP: false # Must be true for MetalLB with IPVS
# conntrack.maxPerCore: null # Connection tracking table size
#
# The ConfigMap also contains kubeconfig.conf pointing to the API server:
# server: https://k8s-master.lab.example.com:6443
#
# Note: kubectl only works on nodes with a valid kubeconfig.
# On worker nodes without ~/.kube/config, you'll get:
# "The connection to the server localhost:8080 was refused"
# Solution: copy admin.conf from master or run kubectl on master.
Container Runtime
The container runtime is responsible for pulling images, creating containers, and managing their lifecycle. Kubernetes communicates with it through the Container Runtime Interface (CRI) — an abstraction that allows different runtimes to be plugged in.
| Runtime | CRI Compatible | Use Case | Notes |
|---|---|---|---|
| containerd | Yes (native) | Standard production runtime | Default for most distributions |
| CRI-O | Yes (native) | Lightweight, OCI-focused | Popular with OpenShift |
| Docker Engine | Via dockershim (removed 1.24) | Development only | No longer supported in K8s |
| gVisor (runsc) | Yes (via containerd) | Security sandbox | Kernel syscall interception |
| Kata Containers | Yes (via containerd) | VM-level isolation | Lightweight VMs per pod |
# Check which container runtime a cluster is using:
kubectl get nodes -o wide
# NAME STATUS ROLES VERSION CONTAINER-RUNTIME
# worker-1 Ready v1.30.0 containerd://1.7.13
# worker-2 Ready v1.30.0 containerd://1.7.13
# containerd CLI (crictl — CRI-compatible):
crictl ps # List running containers
crictl images # List images on node
crictl inspect # Container details
crictl logs # Container logs
# Check containerd status:
systemctl status containerd
crictl info | head -20
Component Communication Flow
Pod Creation Lifecycle
When you run kubectl apply -f deployment.yaml, here's the complete sequence of events across all components:
sequenceDiagram
participant U as User (kubectl)
participant API as API Server
participant ETCD as etcd
participant DC as Deployment Controller
participant RC as ReplicaSet Controller
participant S as Scheduler
participant KL as kubelet (Worker)
participant CR as Container Runtime
U->>API: POST /apis/apps/v1/deployments
API->>API: Authenticate + Authorise + Admit
API->>ETCD: Store Deployment object
API->>U: 201 Created
DC->>API: Watch detects new Deployment
DC->>API: Create ReplicaSet
API->>ETCD: Store ReplicaSet
RC->>API: Watch detects new ReplicaSet
RC->>API: Create Pod (nodeName empty)
API->>ETCD: Store Pod
S->>API: Watch detects unscheduled Pod
S->>S: Filter + Score nodes
S->>API: Bind Pod to worker-2
API->>ETCD: Update Pod.spec.nodeName
KL->>API: Watch detects Pod assigned to me
KL->>CR: Pull image + Create container
CR->>KL: Container started
KL->>API: Update Pod status: Running
API->>ETCD: Store updated status
Watch Mechanism
Controllers don't poll the API Server — they establish long-lived watch connections. When any object changes, the API Server pushes the update to all watchers. This is efficient and enables near-instant reactions:
# How watches work:
# 1. Controller opens HTTP connection: GET /api/v1/pods?watch=true
# 2. API Server keeps connection open
# 3. When a pod changes, API Server sends event over the connection:
# {"type": "MODIFIED", "object": {"kind": "Pod", ...}}
# 4. Controller processes the event and reconciles
# Watch types:
# ADDED — new object created
# MODIFIED — existing object updated
# DELETED — object removed
# Resource versions ensure no events are missed:
# If the connection drops, controller reconnects with last resourceVersion
# API Server replays all changes since that version
# See watches in action:
kubectl get pods --watch -v=7
# I0514 10:23:45.123456 GET https://api:6443/api/v1/pods?watch=true
# I0514 10:23:45.234567 Response Status: 200 OK
# (stream of events follows...)
Exercises
kubectl get pods -n kube-system, identify every control plane component running in your cluster. For each, explain: (a) what it does, (b) what happens if it fails, and (c) how it recovers.
kubectl get events --sort-by=.metadata.creationTimestamp to trace the complete creation flow. Map each event to the component that generated it (scheduler, kubelet, controller, etc.).
Conclusion
Kubernetes architecture implements every distributed systems principle we've covered:
- Consensus (Part 2): etcd uses Raft for consistent state storage
- CAP (Part 3): Kubernetes favours consistency — the API Server provides linearizable reads from etcd
- Service Discovery (Part 4): CoreDNS + Services provide automatic discovery
- Self-Healing (Part 5): Controllers continuously reconcile desired vs actual state
In Part 7, we'll explore the Kubernetes Object Model — Pods, ReplicaSets, Deployments, Services, ConfigMaps, and Secrets — with hands-on exercises for each object type so you can practice every concept on the cluster you just built.