Back to Containers & Runtime Environments Mastery Series

Part 12: Storage & Data Persistence

May 14, 2026 Wasil Zafar 24 min read

Containers are ephemeral by design — when they stop, their filesystem vanishes. This is a feature, not a bug: it guarantees reproducibility and clean state. But applications need persistent data — databases, uploaded files, configuration. Docker provides three mechanisms to bridge this gap: volumes, bind mounts, and tmpfs. Mastering when to use each is the key to running stateful workloads in containers.

Table of Contents

  1. The Ephemeral Problem
  2. Three Storage Types
  3. Docker Volumes
  4. Bind Mounts
  5. tmpfs Mounts
  6. Volume Sharing Between Containers
  7. Storage Drivers
  8. Backup & Restore Strategies
  9. Stateful Application Design
  10. Docker Storage Quotas
  11. Exercises
  12. Conclusion & Next Steps

The Ephemeral Problem

Every container gets its own writable layer on top of the read-only image layers. When the container is removed, that writable layer is deleted — along with everything written to it during the container's lifetime.

# Demonstrate ephemeral storage
docker run --name test alpine sh -c "echo 'important data' > /data.txt && cat /data.txt"
# => important data

# Stop and remove the container
docker rm test

# Start a new container from the same image
docker run --rm alpine cat /data.txt
# => cat: can't open '/data.txt': No such file or directory
# Data is GONE
Why Ephemeral Is a Feature: Ephemeral containers guarantee that every start is a clean state — no leftover temp files, no corrupted caches, no configuration drift. This makes containers reproducible, scalable (any replica is identical), and secure (compromised filesystem doesn't persist). The challenge is making exceptions for data that must persist.

The writable layer also has performance issues: it uses the storage driver's copy-on-write mechanism, which adds latency for write-heavy workloads compared to direct filesystem access.

Three Storage Types

Docker provides three mechanisms for data that needs to outlive a container:

Type Location on Host Managed by Docker Survives Container Removal Best For
Volumes /var/lib/docker/volumes/ Yes Yes (unless pruned) Production persistent data, databases
Bind Mounts Anywhere on host filesystem No Yes (host file persists) Development, config files, host-managed data
tmpfs Host memory (RAM) Yes No — memory only Secrets, sensitive temp data, performance
Docker Storage Architecture
flowchart TD
    subgraph CONTAINER["Container"]
        FS["Container Filesystem
(ephemeral writable layer)"] VM["/app/data
(volume mount point)"] BM["/app/src
(bind mount point)"] TM["/run/secrets
(tmpfs mount point)"] end subgraph HOST["Docker Host"] VOL["/var/lib/docker/volumes/mydata/_data
Docker-managed volume"] BIND["/home/user/project/src
Host directory"] RAM["Host RAM
tmpfs (in-memory)"] LAYERS["Image Layers (read-only)
+ Writable Layer (ephemeral)"] end VM -->|"docker volume"| VOL BM -->|"bind mount"| BIND TM -->|"tmpfs"| RAM FS -->|"storage driver"| LAYERS

Docker Volumes

Volumes are Docker's preferred mechanism for persistent data. They're stored in Docker's managed area (/var/lib/docker/volumes/) and their lifecycle is independent of any container.

# Create a named volume
docker volume create mydata

# List all volumes
docker volume ls
# => DRIVER    VOLUME NAME
# => local     mydata

# Inspect volume details
docker volume inspect mydata
# => [{"CreatedAt": "2026-05-14T10:00:00Z",
# =>   "Driver": "local",
# =>   "Mountpoint": "/var/lib/docker/volumes/mydata/_data",
# =>   "Name": "mydata", "Scope": "local"}]

# Use a named volume with a container
docker run -d --name db \
  -v mydata:/var/lib/postgresql/data \
  -e POSTGRES_PASSWORD=secret \
  postgres:16-alpine

# Data persists after container removal
docker rm -f db
docker run --rm -v mydata:/data alpine ls /data
# => PostgreSQL data files still present!

# Anonymous volume (Docker generates random name)
docker run -d -v /var/lib/mysql --name temp-db mysql:8

# Remove unused volumes (dangling — not attached to any container)
docker volume prune

# Remove a specific volume
docker volume rm mydata

Volume Drivers

The default local driver stores data on the host filesystem. Third-party drivers enable remote storage:

Driver Storage Backend Use Case
local (default) Host filesystem Single-host persistent data
nfs NFS server Shared storage across hosts
rexray/ebs AWS EBS Cloud-native block storage
azure_file Azure File Storage Azure container instances
flocker Portable volumes Volume migration between hosts
# Create a volume with NFS driver using local driver options
docker volume create --driver local \
  --opt type=nfs \
  --opt o=addr=192.168.1.100,rw \
  --opt device=:/shared/data \
  nfs-data

# Use the NFS volume
docker run -d -v nfs-data:/app/shared --name worker my-app

Bind Mounts

Bind mounts map a host directory or file directly into the container. Unlike volumes, Docker doesn't manage the lifecycle — the host path must exist, and the container gets direct access to host files.

# Bind mount a host directory into the container
docker run -d --name dev-server \
  -v $(pwd)/src:/app/src \
  -v $(pwd)/config:/app/config:ro \
  -p 3000:3000 \
  node:20-alpine npm run dev

# :ro makes the mount read-only (container can't modify host files)

# Modern syntax using --mount (more explicit, recommended)
docker run -d --name dev-server \
  --mount type=bind,source=$(pwd)/src,target=/app/src \
  --mount type=bind,source=$(pwd)/config,target=/app/config,readonly \
  -p 3000:3000 \
  node:20-alpine npm run dev

# Bind mount a single file (e.g., custom nginx config)
docker run -d --name web \
  -v $(pwd)/nginx.conf:/etc/nginx/nginx.conf:ro \
  -p 80:80 \
  nginx:alpine

# Development workflow: hot-reload with bind mount
# Changes to ./src on host immediately reflected in container
echo "console.log('live update');" >> src/index.js
# => Container detects change and reloads
Security Warning: Bind mounts give containers direct access to host files. A container with -v /:/host can read/write the entire host filesystem. Never bind-mount sensitive host paths (/etc, /root, /var/run/docker.sock) unless absolutely necessary and the container is trusted.
Aspect Volumes Bind Mounts
Path control Docker manages path You specify exact host path
Pre-population Image content copied to empty volume Host content overlays image content
Portability Works on any Docker host Requires specific host path to exist
Backup docker volume commands Standard filesystem tools
Performance (Linux) Native filesystem speed Native filesystem speed
Performance (macOS/Windows) Fast (in VM filesystem) Slow (file sharing overhead)

tmpfs Mounts

tmpfs mounts store data in the host's memory (RAM). The data is never written to disk and disappears when the container stops. This is ideal for sensitive data that should never persist.

# Create a tmpfs mount for secrets
docker run -d --name secure-app \
  --tmpfs /run/secrets:rw,noexec,nosuid,size=64m \
  my-secure-app

# Using --mount syntax (more explicit)
docker run -d --name secure-app \
  --mount type=tmpfs,destination=/run/secrets,tmpfs-size=67108864,tmpfs-mode=1770 \
  my-secure-app

# tmpfs options:
# size     — maximum size in bytes (default: unlimited = half of host RAM)
# mode     — file permissions (octal)
# noexec   — cannot execute binaries from this mount
# nosuid   — ignore setuid/setgid bits

# Verify it's in memory
docker exec secure-app df -h /run/secrets
# => Filesystem  Size  Used  Avail  Use%  Mounted on
# => tmpfs        64M     0   64M    0%   /run/secrets

# Write a secret — it's in RAM only
docker exec secure-app sh -c 'echo "api_key=sk-abc123" > /run/secrets/api.env'

# Stop container — data is gone forever
docker stop secure-app && docker start secure-app
docker exec secure-app cat /run/secrets/api.env
# => cat: can't open '/run/secrets/api.env': No such file or directory
When to Use tmpfs: API keys, database passwords, session tokens, temporary computation results, or any data that must never touch disk. Combined with Docker secrets (Swarm mode), tmpfs provides a secure secret delivery mechanism.

Volume Sharing Between Containers

Multiple containers can mount the same volume simultaneously. This enables patterns like sidecar containers, log aggregators, and shared caches.

# Pattern 1: Writer + Reader
# Web server writes logs, log shipper reads them
docker volume create app-logs

docker run -d --name web \
  -v app-logs:/var/log/nginx \
  nginx:alpine

docker run -d --name log-shipper \
  -v app-logs:/logs:ro \
  my-fluentd-image

# Pattern 2: Shared data processing pipeline
docker volume create pipeline-data

# Producer writes data
docker run --rm -v pipeline-data:/output alpine \
  sh -c 'echo "processed data" > /output/result.csv'

# Consumer reads data
docker run --rm -v pipeline-data:/input:ro alpine \
  cat /input/result.csv
# => processed data

# Pattern 3: Read-only volume from another container (--volumes-from)
docker run -d --name data-container \
  -v /app/config \
  alpine sh -c 'echo "config=production" > /app/config/app.conf && sleep 3600'

docker run --rm --volumes-from data-container:ro alpine \
  cat /app/config/app.conf
# => config=production
Concurrency Warning: Docker volumes have no built-in locking mechanism. If multiple containers write to the same files simultaneously, you'll get data corruption. Use application-level coordination (database locks, file locks, message queues) or ensure only one writer at a time.

Storage Drivers

Storage drivers power the container's layered filesystem (image layers + writable layer). They're different from volume drivers — storage drivers handle the ephemeral container filesystem, not persistent volumes.

Driver Backing Filesystem Copy-on-Write Performance Status
overlay2 xfs, ext4 File-level CoW Excellent Default (recommended)
btrfs btrfs Block-level CoW Good (snapshots are fast) Supported
zfs ZFS Block-level CoW Good (data integrity) Supported
devicemapper Block devices Block-level CoW Moderate Deprecated (use overlay2)
fuse-overlayfs Any (FUSE) File-level CoW Moderate Rootless Docker
# Check current storage driver
docker info | grep "Storage Driver"
# => Storage Driver: overlay2

# Check backing filesystem
docker info | grep "Backing Filesystem"
# => Backing Filesystem: extfs

# See where layers are stored
ls /var/lib/docker/overlay2/
# => 8a3b2c1d4e5f...  (each directory = one layer)
# => l/                (symlinks for shorter paths)

# Inspect a container's layer structure
docker inspect --format='{{.GraphDriver.Data}}' my-container
# => map[LowerDir:/.../diff MergedDir:/.../merged UpperDir:/.../diff WorkDir:/.../work]

# overlay2 mount structure:
# LowerDir  = read-only image layers (stacked)
# UpperDir  = writable container layer
# MergedDir = unified view (what the container sees)
# WorkDir   = internal overlay2 working directory

Backup & Restore Strategies

Volumes don't have a built-in backup command, but there are reliable patterns using temporary containers to archive volume data.

# Backup a volume to a tar archive
docker run --rm \
  -v mydata:/source:ro \
  -v $(pwd)/backups:/backup \
  alpine tar czf /backup/mydata-$(date +%Y%m%d).tar.gz -C /source .

# Restore a volume from backup
docker volume create mydata-restored
docker run --rm \
  -v mydata-restored:/target \
  -v $(pwd)/backups:/backup:ro \
  alpine tar xzf /backup/mydata-20260514.tar.gz -C /target

# Backup PostgreSQL with pg_dump (application-aware backup)
docker exec postgres pg_dump -U postgres mydb > backup.sql

# Restore PostgreSQL
docker exec -i postgres psql -U postgres mydb < backup.sql

# Copy files between host and container (quick one-off)
docker cp my-container:/var/log/app.log ./app.log
docker cp ./config.yaml my-container:/app/config.yaml

Automated Backup Container

# Dockerfile for automated backup sidecar
# backup-sidecar/Dockerfile
cat <<'EOF'
FROM alpine:3.19
RUN apk add --no-cache tar gzip
COPY backup.sh /usr/local/bin/backup.sh
RUN chmod +x /usr/local/bin/backup.sh
ENTRYPOINT ["/usr/local/bin/backup.sh"]
EOF

# backup.sh - runs on schedule
cat <<'EOF'
#!/bin/sh
BACKUP_DIR="/backups"
SOURCE_DIR="/data"
RETENTION_DAYS=7

while true; do
    TIMESTAMP=$(date +%Y%m%d_%H%M%S)
    tar czf "${BACKUP_DIR}/backup_${TIMESTAMP}.tar.gz" -C "${SOURCE_DIR}" .
    echo "[$(date)] Backup created: backup_${TIMESTAMP}.tar.gz"

    # Remove backups older than retention period
    find "${BACKUP_DIR}" -name "backup_*.tar.gz" -mtime +${RETENTION_DAYS} -delete

    sleep 3600  # Backup every hour
done
EOF

# Deploy alongside your application
docker run -d --name backup \
  -v mydata:/data:ro \
  -v $(pwd)/backups:/backups \
  backup-sidecar

Stateful Application Design

Running databases and other stateful applications in containers requires careful volume design. The key principle: separate state from compute.

Reference Architecture

PostgreSQL with Proper Volume Design

# Create dedicated volumes for different data types
docker volume create pg-data        # Database files
docker volume create pg-wal         # Write-ahead logs (performance)
docker volume create pg-backups     # Backup archives

# Run PostgreSQL with separated volumes
docker run -d --name postgres \
  --network app-net \
  -v pg-data:/var/lib/postgresql/data \
  -v pg-wal:/var/lib/postgresql/wal \
  -v pg-backups:/backups \
  -e POSTGRES_PASSWORD=secret \
  -e POSTGRES_DB=myapp \
  -e PGDATA=/var/lib/postgresql/data/pgdata \
  postgres:16-alpine \
  -c wal_directory=/var/lib/postgresql/wal

# Initialize with seed data
docker exec -i postgres psql -U postgres myapp <<SQL
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    created_at TIMESTAMP DEFAULT NOW()
);
INSERT INTO users (name, email) VALUES
    ('Alice', 'alice@example.com'),
    ('Bob', 'bob@example.com');
SQL

# Verify data persists across container recreation
docker rm -f postgres
docker run -d --name postgres \
  --network app-net \
  -v pg-data:/var/lib/postgresql/data \
  -v pg-wal:/var/lib/postgresql/wal \
  -e POSTGRES_PASSWORD=secret \
  -e PGDATA=/var/lib/postgresql/data/pgdata \
  postgres:16-alpine

docker exec postgres psql -U postgres myapp -c "SELECT * FROM users;"
# => Alice and Bob still present!
PostgreSQL persistent-data production
Stateful Container Best Practices:
  • Always use named volumes (never anonymous) for production data
  • Separate data, logs, and WAL files on different volumes for performance
  • Use :ro mounts where containers only need read access
  • Set health checks to verify database readiness before accepting connections
  • Include automated backup containers as sidecars

Docker Storage Quotas

Without quotas, a single container can fill the host's disk and crash all other containers. Docker provides tools to monitor and limit storage consumption.

# Check Docker disk usage overview
docker system df
# => TYPE            TOTAL   ACTIVE  SIZE      RECLAIMABLE
# => Images          15      5       4.2GB     2.8GB (66%)
# => Containers      8       3       1.1GB     800MB (72%)
# => Local Volumes   12      4       3.5GB     2.1GB (60%)
# => Build Cache     0       0       0B        0B

# Detailed breakdown (shows individual items)
docker system df -v

# Check individual volume sizes
docker system df -v | grep -A 100 "Local Volumes"

# Limit container's writable layer size (requires overlay2 + xfs with pquota)
docker run -d --name limited \
  --storage-opt size=2G \
  nginx:alpine

# Inside the container, writes beyond 2GB will fail
docker exec limited dd if=/dev/zero of=/bigfile bs=1M count=2048
# => dd: error writing '/bigfile': No space left on device (at 2GB)

# Prune everything unused (images, containers, volumes, networks)
docker system prune -a --volumes
# WARNING: This removes ALL unused data

# Safer: prune only dangling (untagged) images and stopped containers
docker system prune

# Monitor Docker directory size on host
du -sh /var/lib/docker/
# => 12G    /var/lib/docker/

du -sh /var/lib/docker/volumes/
# => 3.5G   /var/lib/docker/volumes/

du -sh /var/lib/docker/overlay2/
# => 7.2G   /var/lib/docker/overlay2/
Production Alert: Set up monitoring for /var/lib/docker disk usage. When the Docker storage directory fills up, all containers crash simultaneously. Configure alerting at 80% capacity and automated cleanup policies for unused images and dangling volumes.

Exercises

Exercise 1

Volume Lifecycle Management

Create a named volume, run a PostgreSQL container that writes data to it, remove the container, then start a new PostgreSQL container with the same volume and verify data persistence. Then practice backup (tar) and restore to a new volume. Finally, prune unused volumes and verify your active volume isn't removed.

Exercise 2

Development Workflow with Bind Mounts

Create a Node.js project with a simple Express server. Run it in a container with the source code bind-mounted. Edit a file on the host and verify the change is immediately visible inside the container. Set up nodemon to auto-restart on file changes. Measure the latency between saving a file and the server restarting.

Exercise 3

Storage Driver Investigation

Run docker info to identify your storage driver. Navigate to /var/lib/docker/overlay2/ and identify the layers for a running container. Modify a file inside the container and locate the changed file in the UpperDir on the host. Explain the Copy-on-Write process you just observed.

Exercise 4

Multi-Container Volume Sharing

Create a pipeline: Container A writes timestamped logs to a shared volume every second. Container B (read-only access) tails the log file and prints new entries. Container C runs hourly to compress old logs into archives. Verify all three work correctly with the same volume.

Conclusion & Next Steps

Docker storage is a spectrum from fully ephemeral to fully persistent. The key decisions:

  • Volumes for production data — Docker-managed, survives container lifecycle, supports drivers for remote storage
  • Bind mounts for development — Direct host path access, live code editing, but not portable
  • tmpfs for secrets — In-memory only, never touches disk, perfect for sensitive credentials
  • overlay2 for container filesystem — Efficient CoW for the ephemeral writable layer, don't use for write-heavy workloads
  • Backup volumes regularly — No built-in solution; use sidecar containers with tar archives
  • Monitor disk usagedocker system df and alerting on /var/lib/docker capacity

With networking (Parts 10-11) and storage (Part 12) mastered, you understand the two critical infrastructure pillars for container workloads. Next, we'll look at the standards that make containers portable across any runtime — the OCI specifications.

Next in the Series

In Part 13: OCI Standards & Specifications, we explore the Open Container Initiative's runtime, image, and distribution specifications — the open standards that ensure your containers run identically on Docker, Podman, containerd, CRI-O, or any compliant runtime.