Introduction: Capstone Overview
Capstone projects are your opportunity to synthesize everything you've learned across this series. These four projects demonstrate real-world database design, implementation, and optimization skills that impress employers and clients.
Series Finale: This is Part 15 of 15 in the Complete Database Mastery series. We're bringing together all the concepts from SQL to distributed systems into portfolio-ready implementations.
1
Part 1: SQL Fundamentals & Syntax
Database basics, CRUD operations, joins, constraints
2
Advanced SQL & Query Mastery
CTEs, window functions, stored procedures
3
PostgreSQL Deep Dive
Advanced types, indexing, extensions, tuning
4
MySQL & MariaDB
Storage engines, replication, optimization
5
Transactions & Concurrency
ACID, isolation levels, locking, MVCC
6
Query Optimization & Indexing
EXPLAIN plans, index design, performance
7
Data Modeling & Normalization
ERDs, normal forms, schema design
8
MongoDB & Document Databases
NoSQL, aggregation, sharding
9
Redis & Caching Strategies
Data structures, caching patterns, pub/sub
10
Database Administration & Migrations
Backup, versioning, maintenance
11
Scaling & Distributed Systems
Replication, sharding, CAP theorem
12
Cloud Databases & Managed Services
AWS, Azure, GCP database offerings
13
Database Security & Governance
Encryption, access control, compliance
14
Data Warehousing & Analytics
OLAP, star schemas, columnar DBs
15
Capstone Projects
Portfolio-ready database implementations
You Are Here
Four Capstone Projects
| Project |
Skills Demonstrated |
Difficulty |
| E-Commerce Platform |
Data modeling, transactions, indexing |
⭐⭐⭐ |
| Real-Time Analytics |
Redis, time-series, dashboards |
⭐⭐⭐⭐ |
| Multi-Tenant SaaS |
Security, RLS, tenant isolation |
⭐⭐⭐⭐ |
| Global Distributed DB |
Replication, sharding, CAP trade-offs |
⭐⭐⭐⭐⭐ |
Project 1: E-Commerce Platform Database
Requirements
E-Commerce Database Requirements:
Functional:
- User registration and authentication
- Product catalog with categories and variants
- Shopping cart and order management
- Inventory tracking
- Reviews and ratings
- Payment processing records
Non-Functional:
- Sub-100ms product queries
- Handle 10,000+ concurrent users
- 99.9% uptime
- PCI-DSS compliance for payment data
Schema Design
-- Users & Authentication
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
first_name VARCHAR(100),
last_name VARCHAR(100),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE addresses (
id SERIAL PRIMARY KEY,
user_id INT REFERENCES users(id),
address_type VARCHAR(20), -- 'shipping', 'billing'
street_line1 VARCHAR(255),
street_line2 VARCHAR(255),
city VARCHAR(100),
state VARCHAR(100),
postal_code VARCHAR(20),
country VARCHAR(3),
is_default BOOLEAN DEFAULT false
);
-- Product Catalog
CREATE TABLE categories (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
parent_id INT REFERENCES categories(id),
slug VARCHAR(100) UNIQUE,
path LTREE -- PostgreSQL hierarchical path
);
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
description TEXT,
category_id INT REFERENCES categories(id),
base_price DECIMAL(10,2) NOT NULL,
sku VARCHAR(50) UNIQUE,
status VARCHAR(20) DEFAULT 'active',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE product_variants (
id SERIAL PRIMARY KEY,
product_id INT REFERENCES products(id),
sku VARCHAR(50) UNIQUE,
attributes JSONB, -- {size: 'L', color: 'blue'}
price_adjustment DECIMAL(10,2) DEFAULT 0,
inventory_count INT DEFAULT 0,
low_stock_threshold INT DEFAULT 10
);
-- Orders & Transactions
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INT REFERENCES users(id),
status VARCHAR(20) DEFAULT 'pending',
shipping_address_id INT REFERENCES addresses(id),
billing_address_id INT REFERENCES addresses(id),
subtotal DECIMAL(12,2),
tax_amount DECIMAL(10,2),
shipping_amount DECIMAL(10,2),
total_amount DECIMAL(12,2),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INT REFERENCES orders(id),
variant_id INT REFERENCES product_variants(id),
quantity INT NOT NULL,
unit_price DECIMAL(10,2) NOT NULL,
total_price DECIMAL(12,2) NOT NULL
);
-- Indexes for performance
CREATE INDEX idx_products_category ON products(category_id);
CREATE INDEX idx_products_status ON products(status);
CREATE INDEX idx_variants_product ON product_variants(product_id);
CREATE INDEX idx_variants_attrs ON product_variants USING GIN(attributes);
CREATE INDEX idx_orders_user ON orders(user_id);
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_created ON orders(created_at DESC);
Implementation
-- Checkout transaction with inventory management
CREATE OR REPLACE FUNCTION process_checkout(
p_user_id INT,
p_cart_items JSONB, -- [{variant_id, quantity}, ...]
p_shipping_address_id INT,
p_billing_address_id INT
) RETURNS INT AS $$
DECLARE
v_order_id INT;
v_item JSONB;
v_variant RECORD;
v_subtotal DECIMAL(12,2) := 0;
BEGIN
-- Create order
INSERT INTO orders (user_id, shipping_address_id, billing_address_id, status)
VALUES (p_user_id, p_shipping_address_id, p_billing_address_id, 'processing')
RETURNING id INTO v_order_id;
-- Process each cart item
FOR v_item IN SELECT * FROM jsonb_array_elements(p_cart_items)
LOOP
-- Lock and verify inventory
SELECT * INTO v_variant
FROM product_variants
WHERE id = (v_item->>'variant_id')::INT
FOR UPDATE;
IF v_variant.inventory_count < (v_item->>'quantity')::INT THEN
RAISE EXCEPTION 'Insufficient inventory for variant %', v_variant.id;
END IF;
-- Deduct inventory
UPDATE product_variants
SET inventory_count = inventory_count - (v_item->>'quantity')::INT
WHERE id = v_variant.id;
-- Add order item
INSERT INTO order_items (order_id, variant_id, quantity, unit_price, total_price)
VALUES (
v_order_id,
v_variant.id,
(v_item->>'quantity')::INT,
v_variant.price_adjustment + (
SELECT base_price FROM products WHERE id = v_variant.product_id
),
(v_item->>'quantity')::INT * (
v_variant.price_adjustment + (
SELECT base_price FROM products WHERE id = v_variant.product_id
)
)
);
v_subtotal := v_subtotal + (v_item->>'quantity')::INT * (
v_variant.price_adjustment + (
SELECT base_price FROM products WHERE id = v_variant.product_id
)
);
END LOOP;
-- Update order totals
UPDATE orders SET
subtotal = v_subtotal,
tax_amount = v_subtotal * 0.08,
total_amount = v_subtotal * 1.08
WHERE id = v_order_id;
RETURN v_order_id;
END;
$$ LANGUAGE plpgsql;
Project 2: Real-Time Analytics Pipeline
Requirements
Real-Time Analytics Requirements:
Functional:
- Track website events (page views, clicks, conversions)
- Real-time dashboards with sub-second updates
- Historical trend analysis
- Funnel analysis and cohort tracking
Non-Functional:
- Ingest 100,000+ events/second
- Dashboard queries < 100ms
- 1 year data retention
- 99.9% uptime
Architecture
Architecture:
[Web/Mobile Apps]
↓
[Event Collectors] → [Message Queue (Kafka)]
↓
[Stream Processor]
↓ ↓
[Redis] [TimescaleDB]
(real-time) (historical)
↓ ↓
[Dashboard API Gateway]
↓
[React Dashboard]
Implementation
-- TimescaleDB: Hypertable for events
CREATE TABLE events (
time TIMESTAMPTZ NOT NULL,
session_id UUID,
user_id INT,
event_type VARCHAR(50),
page_url TEXT,
referrer TEXT,
properties JSONB,
device_type VARCHAR(20),
country VARCHAR(3)
);
-- Convert to hypertable
SELECT create_hypertable('events', 'time');
-- Compression for older data
ALTER TABLE events SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'event_type, device_type'
);
SELECT add_compression_policy('events', INTERVAL '7 days');
-- Continuous aggregates for dashboards
CREATE MATERIALIZED VIEW events_hourly
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 hour', time) AS bucket,
event_type,
device_type,
country,
COUNT(*) as event_count,
COUNT(DISTINCT user_id) as unique_users,
COUNT(DISTINCT session_id) as sessions
FROM events
GROUP BY 1, 2, 3, 4
WITH NO DATA;
-- Refresh policy
SELECT add_continuous_aggregate_policy('events_hourly',
start_offset => INTERVAL '3 hours',
end_offset => INTERVAL '1 hour',
schedule_interval => INTERVAL '1 hour'
);
# Redis for real-time counters
import redis
from datetime import datetime
r = redis.Redis(host='localhost', port=6379, decode_responses=True)
def track_event(event_type: str, user_id: int, properties: dict):
now = datetime.utcnow()
minute_bucket = now.strftime('%Y%m%d%H%M')
pipe = r.pipeline()
# Real-time counters
pipe.incr(f'events:{event_type}:total')
pipe.incr(f'events:{event_type}:minute:{minute_bucket}')
pipe.expire(f'events:{event_type}:minute:{minute_bucket}', 3600)
# Unique users (HyperLogLog for approximation)
pipe.pfadd(f'events:{event_type}:users:{minute_bucket}', user_id)
# Recent events stream
pipe.xadd(f'stream:events', {'type': event_type, 'user': user_id}, maxlen=10000)
pipe.execute()
def get_realtime_stats(event_type: str):
minute_bucket = datetime.utcnow().strftime('%Y%m%d%H%M')
return {
'total': r.get(f'events:{event_type}:total'),
'this_minute': r.get(f'events:{event_type}:minute:{minute_bucket}'),
'unique_users': r.pfcount(f'events:{event_type}:users:{minute_bucket}')
}
Project 3: Multi-Tenant SaaS Platform
Requirements
Multi-Tenant SaaS Requirements:
Functional:
- Multiple organizations (tenants)
- User management per organization
- Subscription tiers with feature limits
- Data isolation between tenants
Non-Functional:
- Complete data isolation (GDPR/SOC2)
- Per-tenant query performance SLAs
- Scalable to 10,000+ tenants
- Support enterprise single-tenant option
Tenancy Models
Tenancy Model Comparison:
1. Shared Schema (Row-Level Isolation)
✅ Efficient resource usage
✅ Easy to manage/deploy
❌ Limited isolation
Best for: SMB SaaS, cost-sensitive
2. Schema-per-Tenant
✅ Better isolation
✅ Per-tenant backups
❌ Schema migration complexity
Best for: Mid-market
3. Database-per-Tenant
✅ Maximum isolation
✅ Easy compliance
❌ Highest cost
Best for: Enterprise, regulated industries
Implementation
-- Shared Schema with Row-Level Security
CREATE TABLE tenants (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(100) NOT NULL,
subdomain VARCHAR(50) UNIQUE,
plan VARCHAR(20) DEFAULT 'free',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID REFERENCES tenants(id) NOT NULL,
email VARCHAR(255) NOT NULL,
role VARCHAR(20) DEFAULT 'member',
UNIQUE(tenant_id, email)
);
CREATE TABLE projects (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID REFERENCES tenants(id) NOT NULL,
name VARCHAR(100) NOT NULL,
created_by UUID REFERENCES users(id)
);
-- Enable RLS
ALTER TABLE users ENABLE ROW LEVEL SECURITY;
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;
-- Tenant isolation policies
CREATE POLICY tenant_isolation_users ON users
USING (tenant_id = current_setting('app.current_tenant')::uuid);
CREATE POLICY tenant_isolation_projects ON projects
USING (tenant_id = current_setting('app.current_tenant')::uuid);
-- Application sets tenant context
-- SET app.current_tenant = 'tenant-uuid-here';
# Python middleware for tenant context
from functools import wraps
import psycopg2
def tenant_context(f):
@wraps(f)
def wrapper(*args, **kwargs):
tenant_id = get_tenant_from_request() # From subdomain or header
conn = get_db_connection()
cursor = conn.cursor()
# Set tenant context for RLS
cursor.execute(
"SET app.current_tenant = %s",
(str(tenant_id),)
)
try:
return f(*args, **kwargs)
finally:
cursor.execute("RESET app.current_tenant")
conn.commit()
return wrapper
@tenant_context
def get_projects():
cursor = get_db_cursor()
# RLS automatically filters by tenant!
cursor.execute("SELECT * FROM projects")
return cursor.fetchall()
Project 4: Global Distributed Database
Requirements
Global Distributed Database Requirements:
Functional:
- Users access from NA, EU, APAC regions
- Low-latency reads in each region
- Cross-region writes with conflict resolution
- Data residency compliance (GDPR)
Non-Functional:
- Sub-50ms reads in each region
- Handle network partitions gracefully
- RPO < 1 second, RTO < 30 seconds
- Support for both strong and eventual consistency
Design
Multi-Region Architecture:
[NA Region] [EU Region] [APAC Region]
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ App Servers │ │ App Servers │ │ App Servers │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
│ │ │
┌────────┴────────┐ ┌────────┴────────┐ ┌────────┴────────┐
│ CockroachDB │◄────►│ CockroachDB │◄────►│ CockroachDB │
│ (Primary) │ │ (Replica) │ │ (Replica) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Data Placement:
- User profiles: Pinned to user's home region
- Global data: Replicated across all regions
- EU user data: Must stay in EU (GDPR)
Implementation
-- CockroachDB: Multi-region table configuration
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(255) UNIQUE,
region VARCHAR(10) NOT NULL, -- 'na', 'eu', 'apac'
profile JSONB,
created_at TIMESTAMP DEFAULT NOW()
);
-- Pin data to specific regions
ALTER TABLE users CONFIGURE ZONE USING
num_replicas = 3,
constraints = '{"+region=na": 1, "+region=eu": 1, "+region=apac": 1}',
lease_preferences = '[[+region=na]]';
-- Regional pinning for EU compliance
CREATE TABLE eu_user_data (
user_id UUID REFERENCES users(id),
sensitive_data JSONB
) LOCALITY REGIONAL BY ROW;
ALTER TABLE eu_user_data SET LOCALITY REGIONAL IN "eu-west-1";
# Python: Region-aware connection routing
from typing import Optional
import os
REGION_ENDPOINTS = {
'na': 'cockroach-na.example.com:26257',
'eu': 'cockroach-eu.example.com:26257',
'apac': 'cockroach-apac.example.com:26257'
}
def get_regional_connection(user_region: Optional[str] = None):
"""Route to nearest region for low latency."""
region = user_region or detect_client_region()
endpoint = REGION_ENDPOINTS.get(region, REGION_ENDPOINTS['na'])
return psycopg2.connect(
host=endpoint.split(':')[0],
port=26257,
database='myapp',
user='app_user',
sslmode='require'
)
def detect_client_region() -> str:
"""Detect region from request headers or IP."""
# In practice: Use CloudFlare, AWS headers, or IP geolocation
return os.environ.get('AWS_REGION', 'na')[:2]
Building Your Portfolio
Portfolio Tips:
- GitHub Repository: Create a well-documented repo with README, schema diagrams, and setup instructions
- Docker Compose: Make projects easy to run locally with docker-compose
- Performance Benchmarks: Include load test results showing your optimization work
- Blog Post: Write about challenges faced and solutions implemented
- Live Demo: Deploy a read-only version for interviewers to explore
Recommended Portfolio Structure:
database-capstone-projects/
├── README.md # Overview of all projects
├── docker-compose.yml # Run everything locally
├── 01-ecommerce/
│ ├── README.md
│ ├── schema/
│ │ └── migrations/
│ ├── seed-data/
│ ├── queries/
│ └── benchmarks/
├── 02-realtime-analytics/
│ ├── README.md
│ ├── timescaledb/
│ ├── redis/
│ └── dashboard/
├── 03-multi-tenant-saas/
│ ├── README.md
│ ├── schema/
│ ├── rls-policies/
│ └── api-examples/
└── 04-global-distributed/
├── README.md
├── cockroachdb-config/
├── region-routing/
└── chaos-testing/
Series Conclusion: Your Database Mastery Journey
Congratulations on completing the Complete Database Mastery Series! You've journeyed from SQL fundamentals through advanced query optimization, explored both relational and NoSQL databases, understood distributed systems and cloud offerings, and applied security and analytics best practices.
Series Complete! You now have the knowledge to design, implement, optimize, secure, and scale databases for any application. These capstone projects demonstrate your mastery—build them, deploy them, and showcase them in your portfolio.
What's Next?
- Build the capstone projects — Start with the e-commerce project and progress through all four
- Get certified — Pursue PostgreSQL, AWS Database, or MongoDB certifications
- Contribute to open source — Apply your skills to database-related open source projects
- Teach others — Solidify your knowledge by helping others learn databases
Review the Database Mastery Series
Part 1: SQL Fundamentals & Syntax
Start from the beginning of your database journey.
Read Article
Part 7: Data Modeling & Normalization
Review schema design principles for your projects.
Read Article
Part 11: Scaling & Distributed Systems
Apply distributed concepts to your capstone projects.
Read Article