Back to Systems Thinking & Architecture Mastery Series

Kubernetes Data Plane — Nodes, kubelet, Container Runtime & Networking

May 15, 2026 Wasil Zafar 24 min read

"If the control plane is the brain, the data plane is the body — worker nodes that convert declared intent into running containers, routed packets, and served requests." — The Kubernetes data plane is where abstraction meets reality.

Table of Contents

  1. Worker Node Components
  2. kubelet — Pod Lifecycle Manager
  3. Container Runtime Interface (CRI)
  4. kube-proxy — Service Routing
  5. Pod Networking Model
  6. Node Resource Management
  7. Data Plane Extensions

Worker Node Components

A Kubernetes worker node is a machine (physical or virtual) that runs containerized workloads. While the control plane decides WHAT should run WHERE, the data plane (worker nodes) is responsible for actually RUNNING it. Each node has three essential components:

  • kubelet — the primary node agent; watches the API Server and ensures pods are running
  • Container runtime — actually creates and manages containers (containerd, CRI-O)
  • kube-proxy — implements Service networking (load balancing traffic to pods)
Data Plane Analogy: Think of each worker node as a network switch in SDN. The switch (node) has a forwarding table (pod specs) programmed by the controller (API Server). It executes forwarding decisions (runs containers) at "line rate" without consulting the control plane for every packet (request). The kubelet is the node's local agent — like the switch's OpenFlow agent that receives and applies flow rules.

kubelet — Pod Lifecycle Manager

The kubelet is the most important data plane component. It watches the API Server for Pods assigned to its node and drives the container runtime to match desired state:

kubelet Pod Lifecycle Management
flowchart TB
    API["API Server"] -->|"Watch: Pods on this node"| KL["kubelet"]
    KL --> SYNC["SyncLoop\n(periodic reconciliation)"]
    SYNC --> ADMIT["Admission\n(resource check, node conditions)"]
    ADMIT --> CRI2["CRI Calls\n(create/start/stop containers)"]
    CRI2 --> RT["Container Runtime\n(containerd / CRI-O)"]
    KL --> PROBE["Health Probes\n(liveness, readiness, startup)"]
    KL --> STATUS["Status Reporter\n(→ API Server)"]
    PROBE -->|"Failure"| CRI2
    RT --> CONTAINERS["Running Containers"]
                            

Key kubelet Responsibilities

  • Pod lifecycle — create, start, restart, and kill containers based on pod spec
  • Health checking — execute liveness/readiness/startup probes, restart failing containers
  • Resource management — enforce CPU/memory limits via cgroups, evict pods when node is under pressure
  • Volume mounting — attach and mount persistent volumes into containers
  • Node status — report node conditions (Ready, MemoryPressure, DiskPressure, PIDPressure) to API Server
  • Image management — pull container images, garbage collect unused images
  • Logging/metrics — expose container logs via API, serve metrics on :10250
# Check kubelet status on a node
systemctl status kubelet

# View kubelet logs for debugging
journalctl -u kubelet -f --no-pager | tail -50

# Check node conditions reported by kubelet
kubectl describe node worker-1 | grep -A 10 "Conditions:"

# View kubelet configuration
kubectl proxy &
curl -s http://localhost:8001/api/v1/nodes/worker-1/proxy/configz | \
  python3 -m json.tool | head -40

# Check kubelet health endpoint
curl -sk https://localhost:10250/healthz

Container Runtime Interface (CRI)

The kubelet doesn't create containers directly — it uses the Container Runtime Interface (CRI), a gRPC API that abstracts the runtime. This decouples kubelet from specific container implementations.

CRI: How Pods Become Containers
sequenceDiagram
    participant KL as kubelet
    participant CRI as CRI (gRPC)
    participant RT as containerd
    participant OCI as runc (OCI)

    KL->>CRI: RunPodSandbox(PodSandboxConfig)
    CRI->>RT: Create sandbox (pause container)
    RT->>OCI: Create namespace + cgroups
    OCI-->>RT: Sandbox running
    RT-->>KL: PodSandboxID

    KL->>CRI: PullImage(ImageSpec)
    CRI->>RT: Pull from registry
    RT-->>KL: ImageRef

    KL->>CRI: CreateContainer(PodSandboxID, ContainerConfig)
    CRI->>RT: Create container in sandbox
    RT->>OCI: runc create
    OCI-->>RT: Container created

    KL->>CRI: StartContainer(ContainerID)
    CRI->>RT: Start container process
    RT->>OCI: runc start
    OCI-->>RT: Process running
    RT-->>KL: Started
                            

containerd

The industry-standard runtime (used by Docker, Kubernetes, cloud providers). It handles image pull/push, container lifecycle, storage (snapshots), and interfaces with the low-level OCI runtime (runc) for actual container creation.

CRI-O

Purpose-built for Kubernetes — implements CRI and nothing else. Lighter weight than containerd, no Docker compatibility layer. Favored by OpenShift/Red Hat deployments.

The Sandbox Model: Every pod gets a "pause" container (sandbox) that holds the network namespace. Application containers join this existing namespace. This is why containers in the same pod share localhost — they literally share the same network stack created by the sandbox.

kube-proxy — Service Routing

kube-proxy implements the Kubernetes Service abstraction — mapping a stable ClusterIP to ephemeral pod IPs. It watches the API Server for Service and Endpoint objects, then programs the node's network stack to route traffic correctly.

kube-proxy Modes

kube-proxy Modes — iptables vs IPVS vs eBPF
flowchart TB
    subgraph IPT["iptables Mode"]
        PKT1["Packet → ClusterIP"] --> CHAIN["iptables KUBE-SERVICES chain"]
        CHAIN --> RULE["Random DNAT rule\n(statistic module)"]
        RULE --> POD1["Pod IP:Port"]
    end
    subgraph IPVS2["IPVS Mode"]
        PKT2["Packet → ClusterIP"] --> VIP["IPVS Virtual Server"]
        VIP --> ALGO["LB Algorithm\n(rr, lc, sh, dh)"]
        ALGO --> POD2["Pod IP:Port"]
    end
    subgraph EBPF2["eBPF Mode (Cilium)"]
        PKT3["Packet → ClusterIP"] --> BPF["eBPF Program\n(socket-level or TC)"]
        BPF --> MAP["BPF Map Lookup\n(Service → backends)"]
        MAP --> POD3["Pod IP:Port"]
    end
                            

iptables mode (default): Creates one iptables rule per Service endpoint. Uses the statistic module for random load balancing. Simple but O(n) rule evaluation — degrades at thousands of Services.

IPVS mode: Uses Linux IPVS (IP Virtual Server) kernel module. O(1) lookup via hash tables. Supports multiple load balancing algorithms (round-robin, least connections, source hash). Recommended for clusters with 1000+ Services.

eBPF mode (Cilium): Replaces kube-proxy entirely. Service routing done in eBPF programs attached to socket or TC hooks. Fastest option — avoids both iptables and IPVS overhead, operates at socket level to skip full network stack traversal.

# View iptables rules created by kube-proxy for a Service
# Each Service gets a chain with DNAT rules for each endpoint
iptables -t nat -L KUBE-SERVICES -n | grep my-service

# Example output showing random load balancing:
# -A KUBE-SVC-XXXXX -m statistic --mode random --probability 0.333
#   -j KUBE-SEP-AAAAA
# -A KUBE-SVC-XXXXX -m statistic --mode random --probability 0.500
#   -j KUBE-SEP-BBBBB
# -A KUBE-SVC-XXXXX -j KUBE-SEP-CCCCC

# Check IPVS virtual servers (if using IPVS mode)
ipvsadm -Ln

# Check kube-proxy mode
kubectl logs -n kube-system -l k8s-app=kube-proxy | grep "Using"

Service Types and Traffic Flow

  • ClusterIP — internal-only virtual IP. kube-proxy DNATs packets destined for ClusterIP → pod IP
  • NodePort — opens a port (30000-32767) on every node. Traffic to any node's IP:NodePort → pod
  • LoadBalancer — provisions external cloud LB that routes to NodePorts (or directly to pods with cloud provider integration)
  • ExternalName — DNS CNAME alias, no proxying involved

Pod Networking Model

Kubernetes mandates a flat networking model with three fundamental rules:

  1. Every pod gets its own IP address
  2. Pods can communicate with all other pods without NAT
  3. Agents on a node can communicate with all pods on that node

The CNI (Container Network Interface) plugin is responsible for implementing these guarantees. When kubelet creates a pod sandbox, it calls the CNI plugin to configure networking for the new namespace.

Pod-to-Pod Networking — Same Node vs Cross-Node
flowchart LR
    subgraph Node1["Node 1 (10.0.1.0/24)"]
        PA["Pod A\n10.0.1.5"] --> BR1["Bridge/veth\n(cbr0)"]
        PB["Pod B\n10.0.1.6"] --> BR1
    end
    subgraph Node2["Node 2 (10.0.2.0/24)"]
        PC["Pod C\n10.0.2.3"] --> BR2["Bridge/veth\n(cbr0)"]
        PD["Pod D\n10.0.2.4"] --> BR2
    end
    BR1 -->|"Same node:\nbridge forwarding"| BR1
    BR1 -->|"Cross-node:\nencap or routing"| BR2
                            

Same-node communication: Pods on the same node communicate via a Linux bridge or direct veth-pair routing — stays entirely within the node's network namespace.

Cross-node communication: Depends on CNI implementation — either overlay (VXLAN encapsulation), direct routing (BGP announces pod CIDRs), or cloud-native (AWS ENI, Azure CNI assigning VNet IPs to pods).

Node Resource Management

The kubelet enforces resource boundaries using Linux cgroups and monitors node health for eviction decisions.

QoS Classes

Kubernetes assigns a Quality of Service class to each pod based on its resource specifications:

  • Guaranteed — requests == limits for all containers (highest priority, last evicted)
  • Burstable — at least one container has requests < limits (medium priority)
  • BestEffort — no requests or limits set (lowest priority, first evicted)
# Pod with Guaranteed QoS (requests == limits)
apiVersion: v1
kind: Pod
metadata:
  name: guaranteed-pod
  namespace: production
spec:
  containers:
    - name: app
      image: myapp:v2.1
      resources:
        requests:
          cpu: "500m"
          memory: "256Mi"
        limits:
          cpu: "500m"
          memory: "256Mi"
      livenessProbe:
        httpGet:
          path: /healthz
          port: 8080
        initialDelaySeconds: 10
        periodSeconds: 5
      readinessProbe:
        httpGet:
          path: /ready
          port: 8080
        initialDelaySeconds: 5
        periodSeconds: 3

Eviction

When node resources run low, kubelet evicts pods to protect node stability:

  • Soft eviction — threshold crossed for grace period (e.g., memory.available < 500Mi for 2 minutes)
  • Hard eviction — immediate eviction at threshold (e.g., memory.available < 100Mi)
  • Eviction order: BestEffort → Burstable (over requests) → Guaranteed (only at extreme pressure)
Resource Limits Are Enforced by the Data Plane: CPU limits use CFS throttling (cgroup cpu.cfs_quota_us) — the kernel physically prevents the container from using more CPU. Memory limits use cgroup memory.limit_in_bytes — the kernel OOM-kills the process if it exceeds the limit. These are hard enforcements by the Linux kernel, not suggestions. A container exceeding its memory limit WILL be killed, regardless of what the control plane thinks.

Data Plane Extensions

The Kubernetes data plane is extensible through several plugin interfaces:

Device Plugins

Expose specialized hardware to pods (GPUs, FPGAs, smart NICs, TPUs). The device plugin registers with kubelet via gRPC, advertises available devices, and allocates them to containers at runtime.

CSI Drivers (Container Storage Interface)

Provide persistent storage to pods. CSI separates storage provisioning (control plane: create volume) from mounting (data plane: attach volume to node, mount into container).

# CSI StorageClass — control plane configuration
# that drives data plane volume mounting
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "5000"
  throughput: "250"
  encrypted: "true"
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
allowVolumeExpansion: true

Service Mesh Sidecars

Inject proxy containers (Envoy, Linkerd-proxy) alongside application containers. The sidecar intercepts all network traffic, adding mTLS, retries, circuit breaking, and observability — extending the data plane with service-level networking features without application changes.

Architecture Pattern
The Data Plane Keeps Getting Richer

Originally, the Kubernetes data plane was just "run containers and route packets." Today it encompasses GPUs, persistent storage, service meshes, security policies (seccomp, AppArmor, SELinux), network policies (CNI firewalling), and runtime security (Falco, Tetragon). Each extension adds data plane functionality without changing the control plane API. This is the power of the separation: the control plane remains stable while the data plane evolves rapidly.

ExtensibilityPluginsEvolution
Key Takeaway
Control Plane Decides, Data Plane Executes

The Kubernetes data plane embodies the same principle as a network switch in SDN: it receives instructions from the control plane and executes them locally at maximum efficiency. kubelet is the local agent, the container runtime is the forwarding engine, kube-proxy is the service routing table, and CNI is the network fabric. All are optimized for execution speed and reliability, while intelligence and decision-making remain centralized in the control plane.

ArchitectureSeparationPattern