Back to Systems Thinking & Architecture Mastery Series

Storage Systems Control & Data Planes

May 15, 2026 Wasil Zafar 22 min read

In distributed storage, the control plane decides WHERE data lives and HOW it's protected, while the data plane handles the actual bytes — reading, writing, and transferring data. This separation enables systems to scale metadata management independently from data throughput.

Table of Contents

  1. Storage Control vs Data Plane
  2. Ceph Architecture
  3. HDFS Architecture
  4. S3 Internals
  5. Database Replication
  6. CSI in Kubernetes
  7. Performance — Metadata as Bottleneck
  8. Cross-Cutting Patterns

Storage Control vs Data Plane

Every distributed storage system faces the same fundamental challenge: managing the metadata (where things are, how they're replicated, their consistency state) separately from the data operations (actual reads and writes of bytes). This maps cleanly to the control/data plane pattern.

Storage Control Plane: Metadata management, placement decisions, replication orchestration, failure detection, data rebalancing, namespace management, access control policy. It answers: "Where should this data go? How many copies? What happens when a node fails?"
Storage Data Plane: Actual reads/writes, data transfer between nodes, checksum verification, compression/decompression, encryption at rest, disk I/O operations. It answers: "Read these bytes from disk. Write these bytes. Verify integrity. Transfer to replica."
Storage Control Plane vs Data Plane — Universal Pattern
flowchart TB
    subgraph CP["Storage Control Plane"]
        META["Metadata Service\n(namespace, directory tree)"]
        PLACE["Placement Engine\n(where to store)"]
        REPL["Replication Manager\n(copy orchestration)"]
        HEALTH["Health Monitor\n(failure detection)"]
        META --> PLACE
        PLACE --> REPL
        HEALTH --> PLACE
    end
    subgraph DP["Storage Data Plane"]
        WRITE["Write Path\n(client → storage nodes)"]
        READ["Read Path\n(storage nodes → client)"]
        XFER["Data Transfer\n(replication traffic)"]
        VERIFY["Integrity Check\n(checksums)"]
    end
    CP -->|"Placement map"| DP
    DP -->|"Health status"| CP
                            

Ceph Architecture

Ceph is the canonical example of control/data plane separation in open-source storage. Its control plane (MON + MGR + MDS) handles cluster state and metadata, while its data plane (OSDs) handles actual object storage and retrieval.

Ceph Control Plane Components

  • MON (Monitor) — maintains cluster map (OSD map, MON map, PG map, CRUSH map); Paxos consensus for consistency
  • MGR (Manager) — provides monitoring, orchestration, and cluster management interfaces
  • MDS (Metadata Server) — manages filesystem namespace for CephFS (POSIX metadata: directories, permissions, timestamps)
  • CRUSH algorithm — deterministic placement algorithm; clients can compute object location without querying a central directory

Ceph Data Plane Components

  • OSD (Object Storage Daemon) — one per physical disk; handles actual reads/writes, replication, recovery, rebalancing
  • Placement Groups (PGs) — logical grouping of objects mapped to OSDs via CRUSH; enables efficient rebalancing
Ceph Architecture — MON/MGR (Control) + OSD (Data)
flowchart TB
    CLIENT["Client\n(librbd / CephFS / RGW)"]
    subgraph CP["Control Plane"]
        MON1["MON 1"]
        MON2["MON 2"]
        MON3["MON 3"]
        MGR["MGR\n(Dashboard, Metrics)"]
        MDS["MDS\n(CephFS metadata)"]
        MON1 <--> MON2
        MON2 <--> MON3
    end
    subgraph DP["Data Plane (OSDs)"]
        OSD1["OSD.0\n/dev/sda"]
        OSD2["OSD.1\n/dev/sdb"]
        OSD3["OSD.2\n/dev/sdc"]
        OSD4["OSD.3\n/dev/sdd"]
    end
    CLIENT -->|"1. Get cluster map"| MON1
    CLIENT -->|"2. CRUSH compute"| CLIENT
    CLIENT -->|"3. Direct I/O"| OSD1
    OSD1 -->|"Replicate"| OSD2
    OSD1 -->|"Replicate"| OSD3
    OSD1 -.->|"Heartbeat"| MON1
                            
Key Insight
CRUSH — The Algorithm That Eliminates the Metadata Bottleneck

Ceph's CRUSH (Controlled Replication Under Scalable Hashing) algorithm is a deterministic placement function — given an object name and the cluster map, ANY client can independently compute which OSDs store that object. This means the control plane (MONs) doesn't need to be consulted for every I/O operation. Clients fetch the cluster map once, then go directly to OSDs. This is why Ceph scales: the data plane operates independently of the control plane for normal operations.

CRUSHScalabilityDecentralized
# Check Ceph cluster health (control plane status)
ceph status

# View the CRUSH map (placement rules — control plane)
ceph osd crush dump | head -50

# Check OSD status (data plane nodes)
ceph osd tree

# View placement group distribution
ceph pg stat

# Monitor OSD performance (data plane metrics)
ceph osd perf

HDFS Architecture

Hadoop Distributed File System (HDFS) has the clearest control/data plane separation of any storage system: the NameNode IS the control plane (file system namespace + block locations), and DataNodes ARE the data plane (block storage + retrieval).

NameNode (Control Plane)

  • Stores the entire filesystem namespace in memory (directory tree, file→block mapping, block→DataNode mapping)
  • Handles all metadata operations: open, close, rename, mkdir, ls
  • Manages block replication: decides which DataNodes get copies
  • Processes DataNode heartbeats and block reports
  • Single point of failure (mitigated by HA with standby NameNode + JournalNodes)

DataNode (Data Plane)

  • Stores actual data blocks on local disks (default 128MB blocks)
  • Serves read/write requests directly to clients
  • Performs block replication on NameNode instructions
  • Reports block inventory to NameNode via periodic block reports
  • Sends heartbeats to NameNode every 3 seconds
HDFS NameNode (Control) / DataNode (Data) Architecture
sequenceDiagram
    participant C as Client
    participant NN as NameNode (Control)
    participant DN1 as DataNode 1 (Data)
    participant DN2 as DataNode 2 (Data)
    participant DN3 as DataNode 3 (Data)

    Note over C,DN3: Write Path
    C->>NN: Create file /data/log.txt
    NN->>C: Block locations [DN1, DN2, DN3]
    C->>DN1: Write Block 1
    DN1->>DN2: Pipeline replicate
    DN2->>DN3: Pipeline replicate
    DN3->>C: ACK (all replicas written)
    C->>NN: Complete file

    Note over C,DN3: Read Path
    C->>NN: Open file /data/log.txt
    NN->>C: Block locations [DN1, DN2, DN3]
    C->>DN1: Read Block 1 (nearest replica)
                            
# Check NameNode status (control plane health)
hdfs dfsadmin -report

# View filesystem namespace (control plane metadata)
hdfs dfs -ls /user/hadoop/

# Check DataNode status (data plane nodes)
hdfs dfsadmin -printTopology

# View block distribution for a file
hdfs fsck /user/hadoop/data.csv -files -blocks -locations

# Force NameNode to re-check DataNode blocks
hdfs dfsadmin -triggerBlockReport localhost:9866
The NameNode Bottleneck: Because ALL metadata operations go through the NameNode, it becomes the scalability bottleneck in HDFS. A single NameNode can handle ~100K–500K files/directory operations per second. For clusters with billions of small files, this control plane becomes the limiting factor — not the data plane (DataNodes scale linearly). This led to HDFS Federation, which partitions the namespace across multiple NameNodes.

S3 Internals

Amazon S3 is the world's largest object storage system. While its internals are proprietary, AWS has revealed enough architecture to understand the control/data plane separation.

S3 Control Plane

  • Metadata service — stores object keys, versions, ACLs, storage class, encryption metadata
  • Placement service — decides which physical storage nodes hold object data
  • Consistency layer — since 2020, provides strong read-after-write consistency (previously eventual)
  • Lifecycle manager — transitions objects between storage classes, handles expiration

S3 Data Plane

  • Storage nodes — actual disk arrays storing object data chunks
  • Erasure coding — data is split and coded across multiple disks/AZs for durability
  • Transfer acceleration — edge locations for upload/download optimization
  • Multipart upload — parallel data ingestion for large objects
S3 Request Flow — Control Plane (Metadata) + Data Plane (Storage)
flowchart LR
    CLIENT["Client"] --> LB["Load Balancer"]
    LB --> FE["Front-End\n(Auth + Routing)"]
    FE --> META["Metadata Service\n(Control Plane)"]
    FE --> STORE["Storage Nodes\n(Data Plane)"]
    META -->|"Object location"| FE
    subgraph STORAGE["Data Plane — Storage Layer"]
        STORE --> AZ1["AZ-1 Shards"]
        STORE --> AZ2["AZ-2 Shards"]
        STORE --> AZ3["AZ-3 Shards"]
    end
                            
Architecture Insight
How S3 Achieved Strong Consistency

For years, S3 provided only eventual consistency for overwrites and deletes. In December 2020, AWS announced strong read-after-write consistency at no extra cost. The key was redesigning the control plane's metadata layer — they built a new witness system that ensures the metadata service always reflects the latest write before responding to reads. The data plane didn't need to change; only the control plane logic for tracking "which version is current" was upgraded. This is a perfect example of improving system behavior by modifying only the control plane.

ConsistencyS3Architecture

Database Replication as Control/Data Plane

Database replication maps naturally to the control/data plane pattern. A coordinator decides replication topology and consistency guarantees (control plane), while replicas execute actual data operations (data plane).

  • Primary/Replica topology — primary decides write ordering (control), replicas apply the write log (data)
  • Consensus protocols — Raft/Paxos leader decides commit order (control), followers persist entries (data)
  • Sharding coordinator — decides which shard owns a key range (control), shards serve reads/writes (data)

CSI in Kubernetes

The Container Storage Interface (CSI) brings storage control/data plane separation into Kubernetes. The CSI driver splits into a controller plugin (control plane: provisioning, attaching) and a node plugin (data plane: mounting, formatting).

CSI Driver Architecture — Controller (Control) vs Node (Data)
flowchart TB
    subgraph K8S_CP["Kubernetes Control Plane"]
        PVC["PersistentVolumeClaim"]
        SC["StorageClass"]
        PV["PersistentVolume"]
    end
    subgraph CSI_CP["CSI Control Plane"]
        PROV["Provisioner\n(CreateVolume)"]
        ATTACH["Attacher\n(ControllerPublish)"]
    end
    subgraph CSI_DP["CSI Data Plane (per Node)"]
        STAGE["NodeStageVolume\n(format + mount to global)"]
        PUB["NodePublishVolume\n(bind mount to pod)"]
    end
    PVC --> SC
    SC --> PROV
    PROV -->|"Create disk"| STORAGE["Cloud Storage API"]
    ATTACH -->|"Attach to node"| STORAGE
    STAGE -->|"Format + mount"| DISK["Block Device"]
    PUB -->|"Bind to pod"| POD["Pod Filesystem"]
                            
# StorageClass — tells CSI control plane HOW to provision
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: ebs.csi.aws.com  # CSI controller plugin
parameters:
  type: gp3
  iops: "5000"
  throughput: "250"
  encrypted: "true"
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
---
# PersistentVolumeClaim — requests storage from control plane
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: app-data
spec:
  accessModes: ["ReadWriteOnce"]
  storageClassName: fast-ssd
  resources:
    requests:
      storage: 100Gi
# CSI Driver deployment — split into controller and node components
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ebs-csi-controller  # Control plane — runs once in cluster
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ebs-csi-controller
  template:
    spec:
      containers:
        - name: csi-provisioner   # Watches PVCs, calls CreateVolume
          image: k8s.gcr.io/sig-storage/csi-provisioner:v3.6.0
        - name: csi-attacher      # Watches VolumeAttachments, calls ControllerPublish
          image: k8s.gcr.io/sig-storage/csi-attacher:v4.4.0
        - name: ebs-plugin        # Talks to AWS EC2 API
          image: amazon/aws-ebs-csi-driver:v1.25.0
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: ebs-csi-node  # Data plane — runs on EVERY node
spec:
  selector:
    matchLabels:
      app: ebs-csi-node
  template:
    spec:
      containers:
        - name: ebs-plugin  # Handles NodeStage + NodePublish (mount operations)
          image: amazon/aws-ebs-csi-driver:v1.25.0
        - name: node-driver-registrar
          image: k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.9.0

Performance — Metadata as Bottleneck

In storage systems, the control plane (metadata operations) almost always becomes the bottleneck before the data plane (raw I/O). This is because:

  • Metadata is centralized — a finite number of metadata servers handle all namespace operations
  • Data is distributed — adding more storage nodes linearly increases data throughput
  • Metadata operations require consistency — must be serialized or use consensus
  • Data operations can be parallel — different objects on different nodes are independent
The Metadata Tax: In HDFS, creating a file requires ~5ms of NameNode processing. Opening a file for read: ~2ms. But the actual data transfer runs at line rate (10+ Gbps). For workloads with millions of small files, you spend more time on metadata than data — the control plane dominates latency. This is why object stores (S3, Ceph RGW) outperform file systems for massive-scale workloads: they have simpler metadata models.

Cross-Cutting Patterns

Pattern Summary
Universal Storage Control/Data Plane Patterns
Pattern Control Plane Data Plane
Placement Decides location (CRUSH, hash ring) Stores at location
Replication Decides replica count & placement Copies bytes between nodes
Recovery Detects failure, plans re-replication Reads surviving copies, writes new ones
Rebalancing Computes new placement map Migrates data to new locations
Consistency Defines consistency model (strong/eventual) Implements read/write quorums
PatternsUniversalStorage