Time Series Databases
A time series database (TSDB) is a database optimised for storing and querying time-indexed data — sequences of (timestamp, value) pairs. Unlike general-purpose databases, TSDBs make aggressive trade-offs: they are optimised for high write throughput of sequential data and time-range queries, at the cost of flexibility in querying arbitrary dimensions.
Core TSDB Concepts
Understanding these concepts is essential for reasoning about metrics at scale:
| Concept | Definition | Implication |
|---|---|---|
| Time Series | A sequence of (timestamp, value) pairs with a unique metric name + label set | Each unique label combination is a separate series; cardinality = series count |
| Sample | A single (timestamp, value) data point in a time series | Prometheus default scrape interval: 15s → 4 samples/minute per series |
| Chunk | A compressed block of consecutive samples for one series | Prometheus uses XOR delta-of-delta compression: ~1.37 bytes/sample |
| Block | A time-bounded directory of chunks + index (default 2h) | Compaction merges blocks; read operations scan blocks in range |
Sampling, Retention & Downsampling
Every metrics system must make decisions about how long to keep data at what resolution. The trade-off: finer resolution consumes more storage and memory; coarser resolution loses detail.
Downsampling strategies:
- Prometheus remote_write + Thanos/Mimir: Store full-resolution data in Prometheus, downsample to 5m/1h in long-term storage
- Recording rules: Pre-aggregate high-cardinality queries into lower-cardinality pre-computed metrics
- Retention policies: Prometheus default is 15 days; reduce for cost control
Prometheus Architecture
flowchart TD
A[Targets\nApps & Infrastructure] -->|/metrics endpoint| B[Prometheus Server\nScrape Engine]
B -->|stores| C[TSDB\nLocal Storage]
B -->|evaluates| D[Alerting Rules]
D -->|sends alerts| E[Alertmanager]
E -->|routes| F[PagerDuty / Slack / Email]
C -->|PromQL queries| G[Grafana]
C -->|remote_write| H[Long-Term Storage\nThanos / Mimir / Cortex]
I[Service Discovery\nK8s / Consul / EC2] -->|target list| B
The Pull Model — Why Prometheus Scrapes
Most legacy monitoring systems use a push model: applications send metrics to the monitoring system. Prometheus uses the opposite — a pull model: Prometheus periodically scrapes (HTTP GETs) a /metrics endpoint on each target.
Advantages of the pull model:
- Simple target health check: If Prometheus cannot scrape a target, it knows the target is down — no separate health check needed
- Config in one place: Scrape targets are defined in Prometheus config, not scattered across every application
- No backpressure: Prometheus controls the scrape rate; a misbehaving application cannot flood the metrics backend
- Debugging: You can manually
curlthe/metricsendpoint to see exactly what data Prometheus is collecting
# Manually curl a Prometheus metrics endpoint
curl http://localhost:8080/metrics
# Example output:
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 42
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 2.84
TSDB Storage Internals
Prometheus's TSDB is a custom storage engine with excellent write performance. Key design decisions:
- In-memory head block: Recent 2 hours of data kept in RAM for fast writes and reads
- WAL (Write-Ahead Log): New samples written to WAL first for crash recovery
- Chunk compression: XOR delta-of-delta encoding achieves ~1.37 bytes/sample (vs 16 bytes raw)
- Compaction: Every 2 hours, head block flushed to disk; compaction merges smaller blocks into larger ones
- Inverted index: Label-to-series mapping enables fast label-based queries
Service Discovery
In dynamic environments (Kubernetes, cloud), targets come and go constantly. Prometheus supports automatic service discovery from many sources:
# prometheus.yml — Kubernetes service discovery
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
# Only scrape pods with the annotation prometheus.io/scrape: "true"
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
# Use the annotation prometheus.io/port if specified
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: (.+)
replacement: ${1}
# Add namespace and pod name as labels
- source_labels: [__meta_kubernetes_namespace]
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
target_label: pod
Prometheus Configuration
Scrape Configuration
A complete Prometheus configuration file structure:
# prometheus.yml — complete configuration example
global:
scrape_interval: 15s # Default scrape interval
evaluation_interval: 15s # How often to evaluate rules
external_labels:
environment: production
region: us-east-1
# Alertmanager targets
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093']
# Rule files (recording rules + alerting rules)
rule_files:
- "recording_rules.yml"
- "alerting_rules.yml"
scrape_configs:
# Scrape Prometheus itself
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# Scrape Node Exporter on all servers
- job_name: 'node'
static_configs:
- targets:
- 'server1:9100'
- 'server2:9100'
- 'server3:9100'
scrape_interval: 30s # Override global for this job
# Scrape an application with custom path
- job_name: 'myapp'
metrics_path: /internal/metrics
scheme: https
static_configs:
- targets: ['myapp.example.com:443']
tls_config:
insecure_skip_verify: false
Recording Rules
Recording rules pre-compute expensive queries and store the result as a new time series. This is essential for:
- Dashboard queries that would otherwise be too slow
- Reducing cardinality of frequently-queried aggregations
- Creating "summary" metrics that cross multiple jobs or services
# recording_rules.yml
groups:
- name: http_request_rates
interval: 30s
rules:
# Pre-compute 5-minute request rate by job and status
- record: job:http_requests:rate5m
expr: sum(rate(http_requests_total[5m])) by (job, status_code)
# Pre-compute p99 latency per job
- record: job:http_request_duration_p99:rate5m
expr: |
histogram_quantile(0.99,
sum(rate(http_request_duration_seconds_bucket[5m])) by (job, le)
)
# Pre-compute error rate per job
- record: job:http_error_rate:rate5m
expr: |
sum(rate(http_requests_total{status_code=~"5.."}[5m])) by (job)
/
sum(rate(http_requests_total[5m])) by (job)
Alerting Rules
Alerting rules evaluate PromQL expressions and fire alerts to Alertmanager when conditions are met:
# alerting_rules.yml
groups:
- name: slo_alerts
rules:
# High error rate alert
- alert: HighErrorRate
expr: job:http_error_rate:rate5m > 0.05
for: 5m
labels:
severity: critical
team: platform
annotations:
summary: "High error rate on {{ $labels.job }}"
description: "Error rate is {{ $value | humanizePercentage }} on {{ $labels.job }} (threshold: 5%)"
runbook_url: "https://runbooks.example.com/high-error-rate"
# High latency alert
- alert: HighP99Latency
expr: job:http_request_duration_p99:rate5m > 0.5
for: 10m
labels:
severity: warning
annotations:
summary: "High p99 latency on {{ $labels.job }}"
description: "p99 latency is {{ $value | humanizeDuration }} on {{ $labels.job }}"
# Instance down alert
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} is down"
description: "{{ $labels.job }} on {{ $labels.instance }} has been down for more than 1 minute"
PromQL Deep Dive
PromQL (Prometheus Query Language) is a functional query language designed specifically for time series data. Understanding it deeply is what separates Prometheus beginners from practitioners.
Selectors & Matchers
Selectors specify which time series to query. Matchers filter by label values:
# Exact match — select the metric with this exact label
http_requests_total{job="api", status_code="200"}
# Regex match — select all 5xx status codes
http_requests_total{status_code=~"5.."}
# Negative match — exclude 2xx codes
http_requests_total{status_code!~"2.."}
# Range vector selector — get data over last 5 minutes
http_requests_total{job="api"}[5m]
# Instant vector with offset — query value 1 hour ago
http_requests_total offset 1h
Key PromQL Functions
# rate() — per-second rate of increase for counters
# Use for counters. Handles counter resets.
rate(http_requests_total[5m])
# irate() — instant rate (last two samples only)
# More responsive but noisier than rate()
irate(http_requests_total[5m])
# increase() — total increase over a time range
# Useful for "requests in the last hour"
increase(http_requests_total[1h])
# delta() — change in value over range (for gauges)
delta(node_memory_Active_bytes[1h])
# avg_over_time() — average of a gauge over range
avg_over_time(node_cpu_utilization[30m])
# max_over_time() / min_over_time()
max_over_time(node_memory_Active_bytes[1h])
# histogram_quantile() — calculate percentiles from histograms
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
Aggregation Operators
Aggregations reduce the dimensionality of your query results by grouping and combining series:
# sum() — add up values across dimensions
# Total requests per second across all instances
sum(rate(http_requests_total[5m]))
# sum by() — sum but keep specified labels
# Requests per second, broken down by job
sum by (job) (rate(http_requests_total[5m]))
# sum without() — sum but drop specified labels
# Sum everything except the instance label
sum without (instance) (rate(http_requests_total[5m]))
# avg() — average across dimensions
avg(node_cpu_utilization) by (datacenter)
# max() and min()
max by (job) (http_request_duration_p99)
# count() — count number of time series
count(up == 1) by (job) # How many instances are up per job?
# topk() and bottomk() — top/bottom N series
topk(5, rate(http_requests_total[5m])) # Top 5 highest-traffic services
Practical Query Patterns
Real-world PromQL queries you will use in production:
# SLO compliance: is error rate below 1%?
sum(rate(http_requests_total{status_code=~"5.."}[30m]))
/
sum(rate(http_requests_total[30m]))
< 0.01
# Availability: what % of requests succeeded?
sum(rate(http_requests_total{status_code!~"5.."}[1h]))
/
sum(rate(http_requests_total[1h]))
* 100
# CPU saturation across cluster
100 - avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) * 100
# Memory pressure — warn when available is low
node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 20
# Rate of change in error rate (is it getting worse?)
deriv(rate(http_requests_total{status_code=~"5.."}[10m])[30m:1m])
# Comparing current vs 1 week ago (week-over-week traffic)
rate(http_requests_total[5m])
/
rate(http_requests_total[5m] offset 7d)
Alertmanager — Routing, Grouping & Silencing
Alertmanager receives alerts from Prometheus, applies routing logic, and delivers notifications to the appropriate channels. Its three key features:
1. Grouping
Multiple related alerts are bundled into a single notification. Without grouping, a single database outage could fire 500 separate "instance down" alerts. With grouping, they are delivered as one notification: "500 instances in us-east-1 are down."
2. Inhibition
A critical alert can suppress related lower-priority alerts. If a cluster node is down (critical), Alertmanager inhibits all the "service not responding" alerts from pods on that node — since the root cause is known.
3. Silencing
During planned maintenance, silence alerts matching specific labels for a defined window. This prevents false alarm fatigue during expected downtime.
# alertmanager.yml — routing configuration
global:
resolve_timeout: 5m
route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s # Wait 30s to collect related alerts before sending
group_interval: 5m # How long to wait before sending updates
repeat_interval: 12h # How long before re-alerting
receiver: 'slack-default'
routes:
# Critical alerts go to PagerDuty immediately
- match:
severity: critical
receiver: 'pagerduty'
group_wait: 0s
# Platform team alerts to their Slack channel
- match:
team: platform
receiver: 'slack-platform'
receivers:
- name: 'slack-default'
slack_configs:
- api_url: 'https://hooks.slack.com/services/...'
channel: '#alerts'
title: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
- name: 'pagerduty'
pagerduty_configs:
- routing_key: 'your-pagerduty-routing-key'
Conclusion & Next Steps
Prometheus and PromQL form the bedrock of modern metrics systems. You now understand:
- How time series databases store data efficiently using compression and block-based storage
- Prometheus architecture: pull model, TSDB, Alertmanager, service discovery
- How to configure Prometheus scrape jobs, recording rules, and alerting rules
- PromQL fundamentals: selectors, functions (rate, histogram_quantile), aggregations
- Practical query patterns for SLO compliance, availability, saturation, and anomaly detection