Back to Technology

Complete Database Mastery Part 10: Database Administration & Migrations

January 31, 2026 Wasil Zafar 38 min read

Master database administration and migrations. Learn backup strategies, disaster recovery, schema versioning with Flyway/Liquibase, zero-downtime migrations, and monitoring best practices.

Table of Contents

  1. Introduction
  2. Backup Strategies
  3. Disaster Recovery
  4. Schema Migrations
  5. Zero-Downtime Migrations
  6. Routine Maintenance
  7. Monitoring & Alerting
  8. Conclusion & Next Steps

Introduction: DBA Fundamentals

Database Administration (DBA) is the discipline of keeping databases secure, performant, and available. Whether you're a developer managing your own databases or a dedicated DBA, these skills are essential.

Series Context: This is Part 10 of 15 in the Complete Database Mastery series. We're covering the operational side of database management.

Think of a DBA like a building superintendent: while developers design and build the apartments (schemas), the superintendent keeps the lights on, repairs issues, and ensures the building doesn't burn down (data loss).

Core DBA Responsibilities

Area Tasks Frequency
Availability Backups, failover, monitoring Continuous
Performance Query tuning, indexing, resource allocation Weekly/As needed
Security Access control, encryption, auditing Continuous
Migrations Schema changes, version control Per release
Maintenance VACUUM, ANALYZE, index rebuilds Daily/Weekly

Backup Strategies

Backups are your insurance policy. Without them, a single failure can destroy years of business data.

Full Backups

Complete copy of the entire database. Simple but storage-intensive.

# PostgreSQL - pg_dump (logical backup)
pg_dump -h localhost -U postgres -d myapp > backup_2024-01-15.sql
pg_dump -Fc myapp > backup_2024-01-15.dump  # Custom format (compressed)

# PostgreSQL - pg_basebackup (physical backup)
pg_basebackup -h localhost -D /backup/base -Fp -Xs -P

# MySQL - mysqldump
mysqldump -u root -p --all-databases > full_backup.sql
mysqldump -u root -p --single-transaction myapp > myapp_backup.sql

# MongoDB - mongodump
mongodump --db myapp --out /backup/mongo/2024-01-15
Best Practice: Use --single-transaction (MySQL) or take backups from a replica to avoid locking production.

Incremental & Differential Backups

Only back up what changed since the last backup—saves storage and time.

# PostgreSQL - WAL archiving for incremental
# postgresql.conf
archive_mode = on
archive_command = 'cp %p /backup/wal/%f'
wal_level = replica

# MySQL - Binary log for incremental
# my.cnf
log_bin = /var/log/mysql/mysql-bin
expire_logs_days = 7
binlog_format = ROW

# Backup binlogs (incremental)
mysqlbinlog /var/log/mysql/mysql-bin.000001 > binlog_backup.sql

Backup Strategy Example

Backup Type Frequency Retention
Full backup Weekly (Sunday) 4 weeks
Differential Daily 1 week
Transaction logs Every 15 minutes 7 days

Point-in-Time Recovery (PITR)

Restore to any moment in time—essential for recovering from accidental deletes or corruption.

# PostgreSQL PITR
# 1. Restore base backup
pg_restore -d myapp /backup/base_backup.dump

# 2. Apply WAL logs up to specific time
# recovery.conf (or postgresql.auto.conf in v12+)
restore_command = 'cp /backup/wal/%f %p'
recovery_target_time = '2024-01-15 14:30:00'
recovery_target_action = 'promote'

# MySQL PITR with binary logs
# 1. Restore full backup
mysql -u root -p < full_backup.sql

# 2. Apply binlogs up to a point
mysqlbinlog --stop-datetime="2024-01-15 14:30:00" \
  mysql-bin.000001 mysql-bin.000002 | mysql -u root -p
Critical Rule: Test your restores regularly! A backup you can't restore from is worthless.

Disaster Recovery Planning

Plan for the worst: hardware failures, data center outages, ransomware attacks.

Key Metrics

  • RTO (Recovery Time Objective): How long can you be down? (e.g., 4 hours)
  • RPO (Recovery Point Objective): How much data can you lose? (e.g., 15 minutes)
# Disaster Recovery Checklist

# 1. Off-site backups (3-2-1 rule)
#    - 3 copies of data
#    - 2 different storage types
#    - 1 off-site location
aws s3 cp backup.dump s3://company-dr-backups/postgres/

# 2. Automated backup verification
pg_restore --list backup.dump > /dev/null && echo "Backup valid"

# 3. Documented runbook
# - Step-by-step restore procedures
# - Contact information
# - Escalation paths

# 4. Regular DR drills
# - Quarterly restore tests
# - Measure actual RTO/RPO

Schema Migrations

Version control for your database schema. Track, apply, and rollback changes safely.

Flyway

SQL-based migrations with simple versioning. Great for straightforward schema changes.

# Project structure
migrations/
├── V1__Create_users_table.sql
├── V2__Add_email_to_users.sql
├── V3__Create_orders_table.sql
└── V4__Add_index_on_orders.sql
-- V1__Create_users_table.sql
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    username VARCHAR(50) NOT NULL UNIQUE,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- V2__Add_email_to_users.sql
ALTER TABLE users ADD COLUMN email VARCHAR(255);
CREATE INDEX idx_users_email ON users(email);
# Run migrations
flyway -url=jdbc:postgresql://localhost/myapp migrate

# Check status
flyway info

Liquibase

XML/YAML/JSON changelogs with rollback support and database diffing.

<!-- changelog.xml -->
<databaseChangeLog>
    <changeSet id="1" author="alice">
        <createTable tableName="users">
            <column name="id" type="int" autoIncrement="true">
                <constraints primaryKey="true"/>
            </column>
            <column name="username" type="varchar(50)">
                <constraints nullable="false" unique="true"/>
            </column>
        </createTable>
        <rollback>
            <dropTable tableName="users"/>
        </rollback>
    </changeSet>
</databaseChangeLog>
# Apply changes
liquibase update

# Rollback last change
liquibase rollbackCount 1

# Generate diff between databases
liquibase diff --referenceUrl=jdbc:postgresql://localhost/dev

Alembic (Python/SQLAlchemy)

# migrations/versions/001_create_users.py
from alembic import op
import sqlalchemy as sa

revision = '001'
down_revision = None

def upgrade():
    op.create_table(
        'users',
        sa.Column('id', sa.Integer(), primary_key=True),
        sa.Column('username', sa.String(50), nullable=False, unique=True),
        sa.Column('email', sa.String(255)),
        sa.Column('created_at', sa.DateTime(), server_default=sa.func.now())
    )
    op.create_index('idx_users_email', 'users', ['email'])

def downgrade():
    op.drop_index('idx_users_email')
    op.drop_table('users')
# Generate migration from model changes
alembic revision --autogenerate -m "Add users table"

# Apply migrations
alembic upgrade head

# Rollback
alembic downgrade -1

Zero-Downtime Migrations

Production databases can't afford downtime. Use these patterns for safe schema changes.

Expand-Contract Pattern

  1. Expand: Add new column/table (old code ignores it)
  2. Migrate: Copy/backfill data to new structure
  3. Switch: Update application to use new structure
  4. Contract: Remove old column/table
-- Example: Rename column (email → email_address)

-- Step 1: EXPAND - Add new column
ALTER TABLE users ADD COLUMN email_address VARCHAR(255);

-- Step 2: MIGRATE - Backfill data
UPDATE users SET email_address = email WHERE email_address IS NULL;

-- Step 3: SWITCH - Deploy app using email_address
-- (Application code change)

-- Step 4: CONTRACT - Remove old column (after app fully deployed)
ALTER TABLE users DROP COLUMN email;

Online DDL Tools

# pt-online-schema-change (Percona Toolkit for MySQL)
# Creates shadow table, copies data, swaps atomically
pt-online-schema-change --alter "ADD COLUMN status VARCHAR(20)" \
  --execute D=myapp,t=orders

# gh-ost (GitHub's tool for MySQL)
gh-ost --alter="ADD COLUMN status VARCHAR(20)" \
  --database=myapp --table=orders --execute

# PostgreSQL - Many ALTERs are online by default
# But adding NOT NULL with default locks table
ALTER TABLE users ADD COLUMN status VARCHAR(20) DEFAULT 'active';
-- In PG11+, this is instant (no table rewrite)
Dangerous Operations: Avoid these in production without careful planning: DROP COLUMN, change column type, add NOT NULL constraint to existing data.

Routine Maintenance

VACUUM & ANALYZE (PostgreSQL)

-- VACUUM: Reclaim dead tuple space
VACUUM users;              -- Standard vacuum
VACUUM FULL users;         -- Rewrites table (locks!)
VACUUM ANALYZE users;      -- Vacuum + update statistics

-- ANALYZE: Update query planner statistics
ANALYZE users;
ANALYZE users(email);      -- Specific column

-- Check bloat
SELECT schemaname, relname, n_live_tup, n_dead_tup,
       round(n_dead_tup * 100.0 / nullif(n_live_tup, 0), 2) as dead_pct
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

-- Autovacuum settings (postgresql.conf)
autovacuum = on
autovacuum_vacuum_threshold = 50
autovacuum_vacuum_scale_factor = 0.2

Index Maintenance

-- PostgreSQL: Rebuild bloated index (online in PG12+)
REINDEX INDEX CONCURRENTLY idx_users_email;

-- Check index usage
SELECT indexrelname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan DESC;

-- Find unused indexes
SELECT indexrelname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND indexrelname NOT LIKE 'pg_%';

-- MySQL: Optimize table (rebuilds indexes)
OPTIMIZE TABLE users;

-- MySQL: Analyze table (update statistics)
ANALYZE TABLE users;

Monitoring & Alerting

-- PostgreSQL: Active queries
SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC;

-- Long-running queries (kill if needed)
SELECT pg_cancel_backend(pid);  -- Graceful
SELECT pg_terminate_backend(pid);  -- Force

-- Connection stats
SELECT count(*), state FROM pg_stat_activity GROUP BY state;

-- Database size
SELECT pg_size_pretty(pg_database_size('myapp'));

-- Table sizes
SELECT relname, pg_size_pretty(pg_total_relation_size(relid))
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(relid) DESC
LIMIT 10;

Key Metrics to Monitor

Metric Warning Critical
Connection count >80% of max >95% of max
Query latency (p99) >500ms >2000ms
Disk usage >70% >85%
Replication lag >1 minute >5 minutes
Dead tuples >10% of live >30% of live
Recommended Tools: pgAdmin, Datadog, Grafana + Prometheus, AWS CloudWatch, Azure Monitor, pgHero, Percona Monitoring and Management (PMM).

Conclusion & Next Steps

Database administration is the unsung hero of reliable systems. Mastering backups, migrations, and maintenance ensures your data survives disasters and evolves with your applications.

Next in the Series

In Part 11: Scaling & Distributed Systems, we'll explore replication architectures, sharding strategies, and the CAP theorem.

Technology