Worker Node Components
A Kubernetes worker node is a machine (physical or virtual) that runs containerized workloads. While the control plane decides WHAT should run WHERE, the data plane (worker nodes) is responsible for actually RUNNING it. Each node has three essential components:
- kubelet — the primary node agent; watches the API Server and ensures pods are running
- Container runtime — actually creates and manages containers (containerd, CRI-O)
- kube-proxy — implements Service networking (load balancing traffic to pods)
kubelet — Pod Lifecycle Manager
The kubelet is the most important data plane component. It watches the API Server for Pods assigned to its node and drives the container runtime to match desired state:
flowchart TB
API["API Server"] -->|"Watch: Pods on this node"| KL["kubelet"]
KL --> SYNC["SyncLoop\n(periodic reconciliation)"]
SYNC --> ADMIT["Admission\n(resource check, node conditions)"]
ADMIT --> CRI2["CRI Calls\n(create/start/stop containers)"]
CRI2 --> RT["Container Runtime\n(containerd / CRI-O)"]
KL --> PROBE["Health Probes\n(liveness, readiness, startup)"]
KL --> STATUS["Status Reporter\n(→ API Server)"]
PROBE -->|"Failure"| CRI2
RT --> CONTAINERS["Running Containers"]
Key kubelet Responsibilities
- Pod lifecycle — create, start, restart, and kill containers based on pod spec
- Health checking — execute liveness/readiness/startup probes, restart failing containers
- Resource management — enforce CPU/memory limits via cgroups, evict pods when node is under pressure
- Volume mounting — attach and mount persistent volumes into containers
- Node status — report node conditions (Ready, MemoryPressure, DiskPressure, PIDPressure) to API Server
- Image management — pull container images, garbage collect unused images
- Logging/metrics — expose container logs via API, serve metrics on :10250
# Check kubelet status on a node
systemctl status kubelet
# View kubelet logs for debugging
journalctl -u kubelet -f --no-pager | tail -50
# Check node conditions reported by kubelet
kubectl describe node worker-1 | grep -A 10 "Conditions:"
# View kubelet configuration
kubectl proxy &
curl -s http://localhost:8001/api/v1/nodes/worker-1/proxy/configz | \
python3 -m json.tool | head -40
# Check kubelet health endpoint
curl -sk https://localhost:10250/healthz
Container Runtime Interface (CRI)
The kubelet doesn't create containers directly — it uses the Container Runtime Interface (CRI), a gRPC API that abstracts the runtime. This decouples kubelet from specific container implementations.
sequenceDiagram
participant KL as kubelet
participant CRI as CRI (gRPC)
participant RT as containerd
participant OCI as runc (OCI)
KL->>CRI: RunPodSandbox(PodSandboxConfig)
CRI->>RT: Create sandbox (pause container)
RT->>OCI: Create namespace + cgroups
OCI-->>RT: Sandbox running
RT-->>KL: PodSandboxID
KL->>CRI: PullImage(ImageSpec)
CRI->>RT: Pull from registry
RT-->>KL: ImageRef
KL->>CRI: CreateContainer(PodSandboxID, ContainerConfig)
CRI->>RT: Create container in sandbox
RT->>OCI: runc create
OCI-->>RT: Container created
KL->>CRI: StartContainer(ContainerID)
CRI->>RT: Start container process
RT->>OCI: runc start
OCI-->>RT: Process running
RT-->>KL: Started
containerd
The industry-standard runtime (used by Docker, Kubernetes, cloud providers). It handles image pull/push, container lifecycle, storage (snapshots), and interfaces with the low-level OCI runtime (runc) for actual container creation.
CRI-O
Purpose-built for Kubernetes — implements CRI and nothing else. Lighter weight than containerd, no Docker compatibility layer. Favored by OpenShift/Red Hat deployments.
kube-proxy — Service Routing
kube-proxy implements the Kubernetes Service abstraction — mapping a stable ClusterIP to ephemeral pod IPs. It watches the API Server for Service and Endpoint objects, then programs the node's network stack to route traffic correctly.
kube-proxy Modes
flowchart TB
subgraph IPT["iptables Mode"]
PKT1["Packet → ClusterIP"] --> CHAIN["iptables KUBE-SERVICES chain"]
CHAIN --> RULE["Random DNAT rule\n(statistic module)"]
RULE --> POD1["Pod IP:Port"]
end
subgraph IPVS2["IPVS Mode"]
PKT2["Packet → ClusterIP"] --> VIP["IPVS Virtual Server"]
VIP --> ALGO["LB Algorithm\n(rr, lc, sh, dh)"]
ALGO --> POD2["Pod IP:Port"]
end
subgraph EBPF2["eBPF Mode (Cilium)"]
PKT3["Packet → ClusterIP"] --> BPF["eBPF Program\n(socket-level or TC)"]
BPF --> MAP["BPF Map Lookup\n(Service → backends)"]
MAP --> POD3["Pod IP:Port"]
end
iptables mode (default): Creates one iptables rule per Service endpoint. Uses the statistic module for random load balancing. Simple but O(n) rule evaluation — degrades at thousands of Services.
IPVS mode: Uses Linux IPVS (IP Virtual Server) kernel module. O(1) lookup via hash tables. Supports multiple load balancing algorithms (round-robin, least connections, source hash). Recommended for clusters with 1000+ Services.
eBPF mode (Cilium): Replaces kube-proxy entirely. Service routing done in eBPF programs attached to socket or TC hooks. Fastest option — avoids both iptables and IPVS overhead, operates at socket level to skip full network stack traversal.
# View iptables rules created by kube-proxy for a Service
# Each Service gets a chain with DNAT rules for each endpoint
iptables -t nat -L KUBE-SERVICES -n | grep my-service
# Example output showing random load balancing:
# -A KUBE-SVC-XXXXX -m statistic --mode random --probability 0.333
# -j KUBE-SEP-AAAAA
# -A KUBE-SVC-XXXXX -m statistic --mode random --probability 0.500
# -j KUBE-SEP-BBBBB
# -A KUBE-SVC-XXXXX -j KUBE-SEP-CCCCC
# Check IPVS virtual servers (if using IPVS mode)
ipvsadm -Ln
# Check kube-proxy mode
kubectl logs -n kube-system -l k8s-app=kube-proxy | grep "Using"
Service Types and Traffic Flow
- ClusterIP — internal-only virtual IP. kube-proxy DNATs packets destined for ClusterIP → pod IP
- NodePort — opens a port (30000-32767) on every node. Traffic to any node's IP:NodePort → pod
- LoadBalancer — provisions external cloud LB that routes to NodePorts (or directly to pods with cloud provider integration)
- ExternalName — DNS CNAME alias, no proxying involved
Pod Networking Model
Kubernetes mandates a flat networking model with three fundamental rules:
- Every pod gets its own IP address
- Pods can communicate with all other pods without NAT
- Agents on a node can communicate with all pods on that node
The CNI (Container Network Interface) plugin is responsible for implementing these guarantees. When kubelet creates a pod sandbox, it calls the CNI plugin to configure networking for the new namespace.
flowchart LR
subgraph Node1["Node 1 (10.0.1.0/24)"]
PA["Pod A\n10.0.1.5"] --> BR1["Bridge/veth\n(cbr0)"]
PB["Pod B\n10.0.1.6"] --> BR1
end
subgraph Node2["Node 2 (10.0.2.0/24)"]
PC["Pod C\n10.0.2.3"] --> BR2["Bridge/veth\n(cbr0)"]
PD["Pod D\n10.0.2.4"] --> BR2
end
BR1 -->|"Same node:\nbridge forwarding"| BR1
BR1 -->|"Cross-node:\nencap or routing"| BR2
Same-node communication: Pods on the same node communicate via a Linux bridge or direct veth-pair routing — stays entirely within the node's network namespace.
Cross-node communication: Depends on CNI implementation — either overlay (VXLAN encapsulation), direct routing (BGP announces pod CIDRs), or cloud-native (AWS ENI, Azure CNI assigning VNet IPs to pods).
Node Resource Management
The kubelet enforces resource boundaries using Linux cgroups and monitors node health for eviction decisions.
QoS Classes
Kubernetes assigns a Quality of Service class to each pod based on its resource specifications:
- Guaranteed — requests == limits for all containers (highest priority, last evicted)
- Burstable — at least one container has requests < limits (medium priority)
- BestEffort — no requests or limits set (lowest priority, first evicted)
# Pod with Guaranteed QoS (requests == limits)
apiVersion: v1
kind: Pod
metadata:
name: guaranteed-pod
namespace: production
spec:
containers:
- name: app
image: myapp:v2.1
resources:
requests:
cpu: "500m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "256Mi"
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 3
Eviction
When node resources run low, kubelet evicts pods to protect node stability:
- Soft eviction — threshold crossed for grace period (e.g., memory.available < 500Mi for 2 minutes)
- Hard eviction — immediate eviction at threshold (e.g., memory.available < 100Mi)
- Eviction order: BestEffort → Burstable (over requests) → Guaranteed (only at extreme pressure)
Data Plane Extensions
The Kubernetes data plane is extensible through several plugin interfaces:
Device Plugins
Expose specialized hardware to pods (GPUs, FPGAs, smart NICs, TPUs). The device plugin registers with kubelet via gRPC, advertises available devices, and allocates them to containers at runtime.
CSI Drivers (Container Storage Interface)
Provide persistent storage to pods. CSI separates storage provisioning (control plane: create volume) from mounting (data plane: attach volume to node, mount into container).
# CSI StorageClass — control plane configuration
# that drives data plane volume mounting
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
type: gp3
iops: "5000"
throughput: "250"
encrypted: "true"
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
allowVolumeExpansion: true
Service Mesh Sidecars
Inject proxy containers (Envoy, Linkerd-proxy) alongside application containers. The sidecar intercepts all network traffic, adding mTLS, retries, circuit breaking, and observability — extending the data plane with service-level networking features without application changes.
The Data Plane Keeps Getting Richer
Originally, the Kubernetes data plane was just "run containers and route packets." Today it encompasses GPUs, persistent storage, service meshes, security policies (seccomp, AppArmor, SELinux), network policies (CNI firewalling), and runtime security (Falco, Tetragon). Each extension adds data plane functionality without changing the control plane API. This is the power of the separation: the control plane remains stable while the data plane evolves rapidly.
Control Plane Decides, Data Plane Executes
The Kubernetes data plane embodies the same principle as a network switch in SDN: it receives instructions from the control plane and executes them locally at maximum efficiency. kubelet is the local agent, the container runtime is the forwarding engine, kube-proxy is the service routing table, and CNI is the network fabric. All are optimized for execution speed and reliability, while intelligence and decision-making remain centralized in the control plane.