Part 12: Storage & Data Persistence

The Ephemeral Problem

Every container gets its own writable layer on top of the read-only image layers. When the container is removed, that writable layer is deleted — along with everything written to it during the container's lifetime.

# Demonstrate ephemeral storage
docker run --name test alpine sh -c "echo 'important data' > /data.txt && cat /data.txt"
# => important data

# Stop and remove the container
docker rm test

# Start a new container from the same image
docker run --rm alpine cat /data.txt
# => cat: can't open '/data.txt': No such file or directory
# Data is GONE

                            
                            Why Ephemeral Is a Feature: Ephemeral containers guarantee that every start is a clean state — no leftover temp files, no corrupted caches, no configuration drift. This makes containers reproducible, scalable (any replica is identical), and secure (compromised filesystem doesn't persist). The challenge is making exceptions for data that must persist.
                        

The writable layer also has performance issues: it uses the storage driver's copy-on-write mechanism, which adds latency for write-heavy workloads compared to direct filesystem access.

Three Storage Types

Docker provides three mechanisms for data that needs to outlive a container:

Type	Location on Host	Managed by Docker	Survives Container Removal	Best For
Volumes	`/var/lib/docker/volumes/`	Yes	Yes (unless pruned)	Production persistent data, databases
Bind Mounts	Anywhere on host filesystem	No	Yes (host file persists)	Development, config files, host-managed data
tmpfs	Host memory (RAM)	Yes	No — memory only	Secrets, sensitive temp data, performance

Docker Storage Architecture

flowchart TD
    subgraph CONTAINER["Container"]
        FS["Container Filesystem
(ephemeral writable layer)"]
        VM["/app/data
(volume mount point)"]
        BM["/app/src
(bind mount point)"]
        TM["/run/secrets
(tmpfs mount point)"]
    end
    subgraph HOST["Docker Host"]
        VOL["/var/lib/docker/volumes/mydata/_data
Docker-managed volume"]
        BIND["/home/user/project/src
Host directory"]
        RAM["Host RAM
tmpfs (in-memory)"]
        LAYERS["Image Layers (read-only)
+ Writable Layer (ephemeral)"]
    end

    VM -->|"docker volume"| VOL
    BM -->|"bind mount"| BIND
    TM -->|"tmpfs"| RAM
    FS -->|"storage driver"| LAYERS

Docker Volumes

Volumes are Docker's preferred mechanism for persistent data. They're stored in Docker's managed area (/var/lib/docker/volumes/) and their lifecycle is independent of any container.

# Create a named volume
docker volume create mydata

# List all volumes
docker volume ls
# => DRIVER    VOLUME NAME
# => local     mydata

# Inspect volume details
docker volume inspect mydata
# => [{"CreatedAt": "2026-05-14T10:00:00Z",
# =>   "Driver": "local",
# =>   "Mountpoint": "/var/lib/docker/volumes/mydata/_data",
# =>   "Name": "mydata", "Scope": "local"}]

# Use a named volume with a container
docker run -d --name db \
  -v mydata:/var/lib/postgresql/data \
  -e POSTGRES_PASSWORD=secret \
  postgres:16-alpine

# Data persists after container removal
docker rm -f db
docker run --rm -v mydata:/data alpine ls /data
# => PostgreSQL data files still present!

# Anonymous volume (Docker generates random name)
docker run -d -v /var/lib/mysql --name temp-db mysql:8

# Remove unused volumes (dangling — not attached to any container)
docker volume prune

# Remove a specific volume
docker volume rm mydata

Volume Drivers

The default local driver stores data on the host filesystem. Third-party drivers enable remote storage:

Driver	Storage Backend	Use Case
`local` (default)	Host filesystem	Single-host persistent data
`nfs`	NFS server	Shared storage across hosts
`rexray/ebs`	AWS EBS	Cloud-native block storage
`azure_file`	Azure File Storage	Azure container instances
`flocker`	Portable volumes	Volume migration between hosts

# Create a volume with NFS driver using local driver options
docker volume create --driver local \
  --opt type=nfs \
  --opt o=addr=192.168.1.100,rw \
  --opt device=:/shared/data \
  nfs-data

# Use the NFS volume
docker run -d -v nfs-data:/app/shared --name worker my-app

Bind Mounts

Bind mounts map a host directory or file directly into the container. Unlike volumes, Docker doesn't manage the lifecycle — the host path must exist, and the container gets direct access to host files.

# Bind mount a host directory into the container
docker run -d --name dev-server \
  -v $(pwd)/src:/app/src \
  -v $(pwd)/config:/app/config:ro \
  -p 3000:3000 \
  node:20-alpine npm run dev

# :ro makes the mount read-only (container can't modify host files)

# Modern syntax using --mount (more explicit, recommended)
docker run -d --name dev-server \
  --mount type=bind,source=$(pwd)/src,target=/app/src \
  --mount type=bind,source=$(pwd)/config,target=/app/config,readonly \
  -p 3000:3000 \
  node:20-alpine npm run dev

# Bind mount a single file (e.g., custom nginx config)
docker run -d --name web \
  -v $(pwd)/nginx.conf:/etc/nginx/nginx.conf:ro \
  -p 80:80 \
  nginx:alpine

# Development workflow: hot-reload with bind mount
# Changes to ./src on host immediately reflected in container
echo "console.log('live update');" >> src/index.js
# => Container detects change and reloads

                            
                            Security Warning: Bind mounts give containers direct access to host files. A container with -v /:/host can read/write the entire host filesystem. Never bind-mount sensitive host paths (/etc, /root, /var/run/docker.sock) unless absolutely necessary and the container is trusted.
                        

Aspect	Volumes	Bind Mounts
Path control	Docker manages path	You specify exact host path
Pre-population	Image content copied to empty volume	Host content overlays image content
Portability	Works on any Docker host	Requires specific host path to exist
Backup	`docker volume` commands	Standard filesystem tools
Performance (Linux)	Native filesystem speed	Native filesystem speed
Performance (macOS/Windows)	Fast (in VM filesystem)	Slow (file sharing overhead)

tmpfs Mounts

tmpfs mounts store data in the host's memory (RAM). The data is never written to disk and disappears when the container stops. This is ideal for sensitive data that should never persist.

# Create a tmpfs mount for secrets
docker run -d --name secure-app \
  --tmpfs /run/secrets:rw,noexec,nosuid,size=64m \
  my-secure-app

# Using --mount syntax (more explicit)
docker run -d --name secure-app \
  --mount type=tmpfs,destination=/run/secrets,tmpfs-size=67108864,tmpfs-mode=1770 \
  my-secure-app

# tmpfs options:
# size     — maximum size in bytes (default: unlimited = half of host RAM)
# mode     — file permissions (octal)
# noexec   — cannot execute binaries from this mount
# nosuid   — ignore setuid/setgid bits

# Verify it's in memory
docker exec secure-app df -h /run/secrets
# => Filesystem  Size  Used  Avail  Use%  Mounted on
# => tmpfs        64M     0   64M    0%   /run/secrets

# Write a secret — it's in RAM only
docker exec secure-app sh -c 'echo "api_key=sk-abc123" > /run/secrets/api.env'

# Stop container — data is gone forever
docker stop secure-app && docker start secure-app
docker exec secure-app cat /run/secrets/api.env
# => cat: can't open '/run/secrets/api.env': No such file or directory

                            
                            When to Use tmpfs: API keys, database passwords, session tokens, temporary computation results, or any data that must never touch disk. Combined with Docker secrets (Swarm mode), tmpfs provides a secure secret delivery mechanism.
                        

Multiple containers can mount the same volume simultaneously. This enables patterns like sidecar containers, log aggregators, and shared caches.

# Pattern 1: Writer + Reader
# Web server writes logs, log shipper reads them
docker volume create app-logs

docker run -d --name web \
  -v app-logs:/var/log/nginx \
  nginx:alpine

docker run -d --name log-shipper \
  -v app-logs:/logs:ro \
  my-fluentd-image

# Pattern 2: Shared data processing pipeline
docker volume create pipeline-data

# Producer writes data
docker run --rm -v pipeline-data:/output alpine \
  sh -c 'echo "processed data" > /output/result.csv'

# Consumer reads data
docker run --rm -v pipeline-data:/input:ro alpine \
  cat /input/result.csv
# => processed data

# Pattern 3: Read-only volume from another container (--volumes-from)
docker run -d --name data-container \
  -v /app/config \
  alpine sh -c 'echo "config=production" > /app/config/app.conf && sleep 3600'

docker run --rm --volumes-from data-container:ro alpine \
  cat /app/config/app.conf
# => config=production

                            
                            Concurrency Warning: Docker volumes have no built-in locking mechanism. If multiple containers write to the same files simultaneously, you'll get data corruption. Use application-level coordination (database locks, file locks, message queues) or ensure only one writer at a time.
                        

Storage Drivers

Storage drivers power the container's layered filesystem (image layers + writable layer). They're different from volume drivers — storage drivers handle the ephemeral container filesystem, not persistent volumes.

Driver	Backing Filesystem	Copy-on-Write	Performance	Status
`overlay2`	xfs, ext4	File-level CoW	Excellent	Default (recommended)
`btrfs`	btrfs	Block-level CoW	Good (snapshots are fast)	Supported
`zfs`	ZFS	Block-level CoW	Good (data integrity)	Supported
`devicemapper`	Block devices	Block-level CoW	Moderate	Deprecated (use overlay2)
`fuse-overlayfs`	Any (FUSE)	File-level CoW	Moderate	Rootless Docker

# Check current storage driver
docker info | grep "Storage Driver"
# => Storage Driver: overlay2

# Check backing filesystem
docker info | grep "Backing Filesystem"
# => Backing Filesystem: extfs

# See where layers are stored
ls /var/lib/docker/overlay2/
# => 8a3b2c1d4e5f...  (each directory = one layer)
# => l/                (symlinks for shorter paths)

# Inspect a container's layer structure
docker inspect --format='{{.GraphDriver.Data}}' my-container
# => map[LowerDir:/.../diff MergedDir:/.../merged UpperDir:/.../diff WorkDir:/.../work]

# overlay2 mount structure:
# LowerDir  = read-only image layers (stacked)
# UpperDir  = writable container layer
# MergedDir = unified view (what the container sees)
# WorkDir   = internal overlay2 working directory

Backup & Restore Strategies

Volumes don't have a built-in backup command, but there are reliable patterns using temporary containers to archive volume data.

# Backup a volume to a tar archive
docker run --rm \
  -v mydata:/source:ro \
  -v $(pwd)/backups:/backup \
  alpine tar czf /backup/mydata-$(date +%Y%m%d).tar.gz -C /source .

# Restore a volume from backup
docker volume create mydata-restored
docker run --rm \
  -v mydata-restored:/target \
  -v $(pwd)/backups:/backup:ro \
  alpine tar xzf /backup/mydata-20260514.tar.gz -C /target

# Backup PostgreSQL with pg_dump (application-aware backup)
docker exec postgres pg_dump -U postgres mydb > backup.sql

# Restore PostgreSQL
docker exec -i postgres psql -U postgres mydb < backup.sql

# Copy files between host and container (quick one-off)
docker cp my-container:/var/log/app.log ./app.log
docker cp ./config.yaml my-container:/app/config.yaml

Automated Backup Container

# Dockerfile for automated backup sidecar
# backup-sidecar/Dockerfile
cat <<'EOF'
FROM alpine:3.19
RUN apk add --no-cache tar gzip
COPY backup.sh /usr/local/bin/backup.sh
RUN chmod +x /usr/local/bin/backup.sh
ENTRYPOINT ["/usr/local/bin/backup.sh"]
EOF

# backup.sh - runs on schedule
cat <<'EOF'
#!/bin/sh
BACKUP_DIR="/backups"
SOURCE_DIR="/data"
RETENTION_DAYS=7

while true; do
    TIMESTAMP=$(date +%Y%m%d_%H%M%S)
    tar czf "${BACKUP_DIR}/backup_${TIMESTAMP}.tar.gz" -C "${SOURCE_DIR}" .
    echo "[$(date)] Backup created: backup_${TIMESTAMP}.tar.gz"

    # Remove backups older than retention period
    find "${BACKUP_DIR}" -name "backup_*.tar.gz" -mtime +${RETENTION_DAYS} -delete

    sleep 3600  # Backup every hour
done
EOF

# Deploy alongside your application
docker run -d --name backup \
  -v mydata:/data:ro \
  -v $(pwd)/backups:/backups \
  backup-sidecar

Stateful Application Design

Running databases and other stateful applications in containers requires careful volume design. The key principle: separate state from compute.

Reference Architecture

PostgreSQL with Proper Volume Design

# Create dedicated volumes for different data types
docker volume create pg-data        # Database files
docker volume create pg-wal         # Write-ahead logs (performance)
docker volume create pg-backups     # Backup archives

# Run PostgreSQL with separated volumes
docker run -d --name postgres \
  --network app-net \
  -v pg-data:/var/lib/postgresql/data \
  -v pg-wal:/var/lib/postgresql/wal \
  -v pg-backups:/backups \
  -e POSTGRES_PASSWORD=secret \
  -e POSTGRES_DB=myapp \
  -e PGDATA=/var/lib/postgresql/data/pgdata \
  postgres:16-alpine \
  -c wal_directory=/var/lib/postgresql/wal

# Initialize with seed data
docker exec -i postgres psql -U postgres myapp <<SQL
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    created_at TIMESTAMP DEFAULT NOW()
);
INSERT INTO users (name, email) VALUES
    ('Alice', 'alice@example.com'),
    ('Bob', 'bob@example.com');
SQL

# Verify data persists across container recreation
docker rm -f postgres
docker run -d --name postgres \
  --network app-net \
  -v pg-data:/var/lib/postgresql/data \
  -v pg-wal:/var/lib/postgresql/wal \
  -e POSTGRES_PASSWORD=secret \
  -e PGDATA=/var/lib/postgresql/data/pgdata \
  postgres:16-alpine

docker exec postgres psql -U postgres myapp -c "SELECT * FROM users;"
# => Alice and Bob still present!

PostgreSQL persistent-data production

                            
                            Stateful Container Best Practices:
                            Always use named volumes (never anonymous) for production data
Separate data, logs, and WAL files on different volumes for performance
Use :ro mounts where containers only need read access
Set health checks to verify database readiness before accepting connections
Include automated backup containers as sidecars

                        

Docker Storage Quotas

Without quotas, a single container can fill the host's disk and crash all other containers. Docker provides tools to monitor and limit storage consumption.

# Check Docker disk usage overview
docker system df
# => TYPE            TOTAL   ACTIVE  SIZE      RECLAIMABLE
# => Images          15      5       4.2GB     2.8GB (66%)
# => Containers      8       3       1.1GB     800MB (72%)
# => Local Volumes   12      4       3.5GB     2.1GB (60%)
# => Build Cache     0       0       0B        0B

# Detailed breakdown (shows individual items)
docker system df -v

# Check individual volume sizes
docker system df -v | grep -A 100 "Local Volumes"

# Limit container's writable layer size (requires overlay2 + xfs with pquota)
docker run -d --name limited \
  --storage-opt size=2G \
  nginx:alpine

# Inside the container, writes beyond 2GB will fail
docker exec limited dd if=/dev/zero of=/bigfile bs=1M count=2048
# => dd: error writing '/bigfile': No space left on device (at 2GB)

# Prune everything unused (images, containers, volumes, networks)
docker system prune -a --volumes
# WARNING: This removes ALL unused data

# Safer: prune only dangling (untagged) images and stopped containers
docker system prune

# Monitor Docker directory size on host
du -sh /var/lib/docker/
# => 12G    /var/lib/docker/

du -sh /var/lib/docker/volumes/
# => 3.5G   /var/lib/docker/volumes/

du -sh /var/lib/docker/overlay2/
# => 7.2G   /var/lib/docker/overlay2/

                            
                            Production Alert: Set up monitoring for /var/lib/docker disk usage. When the Docker storage directory fills up, all containers crash simultaneously. Configure alerting at 80% capacity and automated cleanup policies for unused images and dangling volumes.
                        

Exercises

Exercise 1

Volume Lifecycle Management

Create a named volume, run a PostgreSQL container that writes data to it, remove the container, then start a new PostgreSQL container with the same volume and verify data persistence. Then practice backup (tar) and restore to a new volume. Finally, prune unused volumes and verify your active volume isn't removed.

Exercise 2

Development Workflow with Bind Mounts

Create a Node.js project with a simple Express server. Run it in a container with the source code bind-mounted. Edit a file on the host and verify the change is immediately visible inside the container. Set up nodemon to auto-restart on file changes. Measure the latency between saving a file and the server restarting.

Exercise 3

Storage Driver Investigation

Run docker info to identify your storage driver. Navigate to /var/lib/docker/overlay2/ and identify the layers for a running container. Modify a file inside the container and locate the changed file in the UpperDir on the host. Explain the Copy-on-Write process you just observed.

Exercise 4

Multi-Container Volume Sharing

Create a pipeline: Container A writes timestamped logs to a shared volume every second. Container B (read-only access) tails the log file and prints new entries. Container C runs hourly to compress old logs into archives. Verify all three work correctly with the same volume.

Conclusion & Next Steps

Docker storage is a spectrum from fully ephemeral to fully persistent. The key decisions:

Volumes for production data — Docker-managed, survives container lifecycle, supports drivers for remote storage
Bind mounts for development — Direct host path access, live code editing, but not portable
tmpfs for secrets — In-memory only, never touches disk, perfect for sensitive credentials
overlay2 for container filesystem — Efficient CoW for the ephemeral writable layer, don't use for write-heavy workloads
Backup volumes regularly — No built-in solution; use sidecar containers with tar archives
Monitor disk usage — docker system df and alerting on /var/lib/docker capacity

With networking (Parts 10-11) and storage (Part 12) mastered, you understand the two critical infrastructure pillars for container workloads. Next, we'll look at the standards that make containers portable across any runtime — the OCI specifications.

Next in the Series

In Part 13: OCI Standards & Specifications, we explore the Open Container Initiative's runtime, image, and distribution specifications — the open standards that ensure your containers run identically on Docker, Podman, containerd, CRI-O, or any compliant runtime.

Previous Part 11: Advanced Networking Next Part 13: OCI Standards

Cookie Consent

Part 12: Storage & Data Persistence

Table of Contents

The Ephemeral Problem

Three Storage Types

Docker Volumes

Volume Drivers

Bind Mounts

tmpfs Mounts

Storage Drivers

Backup & Restore Strategies

Automated Backup Container

Stateful Application Design

PostgreSQL with Proper Volume Design

Docker Storage Quotas

Exercises

Volume Lifecycle Management

Development Workflow with Bind Mounts

Storage Driver Investigation

Multi-Container Volume Sharing

Conclusion & Next Steps

Next in the Series

Cookie Consent

Part 12: Storage & Data Persistence

Table of Contents

The Ephemeral Problem

Three Storage Types

Docker Volumes

Volume Drivers

Bind Mounts

tmpfs Mounts

Volume Sharing Between Containers

Storage Drivers

Backup & Restore Strategies

Automated Backup Container

Stateful Application Design

PostgreSQL with Proper Volume Design

Docker Storage Quotas

Exercises

Volume Lifecycle Management

Development Workflow with Bind Mounts

Storage Driver Investigation

Multi-Container Volume Sharing

Conclusion & Next Steps

Next in the Series

Continue the Series

Part 11: Advanced Networking & Service Discovery

Part 4: Union File Systems & Image Layering

Part 13: OCI Standards & Specifications