Thanos Overview & Philosophy
Thanos is a CNCF Incubating project that extends Prometheus with long-term storage and global querying capabilities. Unlike Mimir or VictoriaMetrics, Thanos doesn’t replace Prometheus — it augments existing Prometheus deployments by:
- Uploading TSDB blocks from Prometheus to object storage (S3/GCS/Azure) via a sidecar
- Querying across multiple Prometheus instances transparently through a single endpoint
- Downsampling historical data automatically (5m and 1h resolutions)
- Deduplicating HA pair data at query time
Architecture
flowchart TD
subgraph Cluster1["Cluster: US-East"]
P1[Prometheus + Sidecar]
P2[Prometheus + Sidecar]
end
subgraph Cluster2["Cluster: EU-West"]
P3[Prometheus + Sidecar]
P4[Prometheus + Sidecar]
end
subgraph Thanos["Thanos Global Layer"]
TQ[Thanos Query]
SG[Store Gateway]
TC[Compactor]
TR[Ruler]
end
subgraph Storage["Object Store"]
S3[(S3 Bucket)]
end
P1 & P2 -->|"upload blocks"| S3
P3 & P4 -->|"upload blocks"| S3
P1 & P2 & P3 & P4 -->|"StoreAPI gRPC"| TQ
SG -->|"serves historical"| TQ
SG --> S3
TC -->|"compact + downsample"| S3
TR --> TQ
GF[Grafana] --> TQ
Thanos Sidecar
The Sidecar runs alongside each Prometheus instance as a container in the same pod. It has two responsibilities:
- Block Upload: Watches Prometheus’s data directory and uploads completed TSDB blocks to object storage
- StoreAPI: Exposes a gRPC StoreAPI endpoint that Thanos Query can connect to for real-time data from Prometheus’s head block
# Thanos Sidecar container — added to Prometheus pod
containers:
- name: thanos-sidecar
image: quay.io/thanos/thanos:v0.35.1
args:
- sidecar
- --tsdb.path=/prometheus
- --prometheus.url=http://localhost:9090
- --objstore.config-file=/etc/thanos/objstore.yaml
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:10902
# Ship blocks every 2 hours (matches Prometheus min-block-duration)
- --shipper.upload-compacted
ports:
- containerPort: 10901
name: grpc
- containerPort: 10902
name: http
volumeMounts:
- name: prometheus-storage
mountPath: /prometheus
- name: thanos-objstore-config
mountPath: /etc/thanos
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
memory: 512Mi
--storage.tsdb.min-block-duration=2h and --storage.tsdb.max-block-duration=2h set to the same value. This disables Prometheus’s internal compaction and allows Thanos Compactor to handle it instead. Without this, both Prometheus and Thanos try to compact, causing data corruption.
Thanos Query
Thanos Query implements the Prometheus HTTP API and fans out queries to multiple StoreAPI endpoints (sidecars, store gateways, other query instances). It merges results, deduplicates HA replicas, and returns a unified response:
# Thanos Query deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: thanos-query
namespace: monitoring
spec:
replicas: 2
template:
spec:
containers:
- name: thanos-query
image: quay.io/thanos/thanos:v0.35.1
args:
- query
- --log.level=info
- --query.replica-label=__replica__
- --query.replica-label=prometheus_replica
- --query.auto-downsampling
- --query.max-concurrent=20
- --query.timeout=2m
# Discover stores via DNS
- --store=dnssrv+_grpc._tcp.thanos-sidecar.monitoring.svc
- --store=dnssrv+_grpc._tcp.thanos-store-gateway.monitoring.svc
# Or static endpoints
- --store=thanos-sidecar-us-east:10901
- --store=thanos-sidecar-eu-west:10901
- --store=thanos-store-gateway:10901
ports:
- containerPort: 10902
name: http
- containerPort: 10901
name: grpc
Store Gateway
The Store Gateway serves historical TSDB blocks from object storage. It indexes block metadata and serves queries against them through the StoreAPI. It caches index data locally for fast lookups:
# Thanos Store Gateway — serves historical data from object store
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: thanos-store-gateway
spec:
replicas: 3
template:
spec:
containers:
- name: thanos-store
image: quay.io/thanos/thanos:v0.35.1
args:
- store
- --data-dir=/data
- --objstore.config-file=/etc/thanos/objstore.yaml
- --index-cache-size=2GB
- --chunk-pool-size=4GB
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:10902
# Time-based partitioning (optional for very large stores)
# - --min-time=-720h # Only serve last 30 days
# - --max-time=-2h # Don't serve very recent (sidecar handles it)
volumeMounts:
- name: data
mountPath: /data
- name: objstore-config
mountPath: /etc/thanos
resources:
requests:
cpu: "1"
memory: 4Gi
limits:
memory: 8Gi
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi # For index cache
Compactor & Downsampling
The Compactor runs as a singleton, continuously processing blocks in object storage:
- Compaction: Merges small 2h blocks into larger blocks (up to the configured max), reducing object count and improving query performance
- Downsampling: Creates 5-minute and 1-hour resolution versions of data older than configured thresholds
- Retention: Deletes blocks exceeding the retention period
# Thanos Compactor — singleton deployment
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: thanos-compactor
spec:
replicas: 1 # MUST be 1 — singleton
template:
spec:
containers:
- name: thanos-compact
image: quay.io/thanos/thanos:v0.35.1
args:
- compact
- --data-dir=/data
- --objstore.config-file=/etc/thanos/objstore.yaml
- --http-address=0.0.0.0:10902
- --wait # Run continuously (not one-shot)
- --wait-interval=5m
# Retention configuration
- --retention.resolution-raw=90d # Keep raw data 90 days
- --retention.resolution-5m=365d # Keep 5m downsampled 1 year
- --retention.resolution-1h=1825d # Keep 1h downsampled 5 years
# Downsampling
- --downsample.concurrency=4
# Compaction
- --compact.concurrency=2
volumeMounts:
- name: data
mountPath: /data
- name: objstore-config
mountPath: /etc/thanos
resources:
requests:
cpu: "2"
memory: 4Gi
limits:
memory: 8Gi
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 100Gi # Scratch space for compaction
Thanos Ruler
Thanos Ruler evaluates recording and alerting rules against the global Thanos Query endpoint, enabling rules that span multiple Prometheus instances:
# Thanos Ruler — evaluates rules against global view
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: thanos-ruler
spec:
replicas: 2 # HA pair
template:
spec:
containers:
- name: thanos-rule
image: quay.io/thanos/thanos:v0.35.1
args:
- rule
- --data-dir=/data
- --objstore.config-file=/etc/thanos/objstore.yaml
- --rule-file=/etc/thanos-rules/*.yaml
- --query=dnssrv+_http._tcp.thanos-query.monitoring.svc
- --alertmanagers.url=http://alertmanager:9093
- --alert.label-drop=__replica__
- --label=ruler_cluster="global"
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:10902
Production Deployment
Object Store Configuration
# objstore.yaml — S3 configuration
type: S3
config:
bucket: thanos-metrics-prod
endpoint: s3.us-east-1.amazonaws.com
region: us-east-1
# Use IAM role (IRSA) rather than static credentials
# access_key: ""
# secret_key: ""
insecure: false
signature_version2: false
http_config:
idle_conn_timeout: 90s
response_header_timeout: 2m
tls_config:
insecure_skip_verify: false
# GCS configuration alternative
type: GCS
config:
bucket: thanos-metrics-prod
# Uses workload identity or GOOGLE_APPLICATION_CREDENTIALS
service_account: ""
Sidecar Deployment
# Complete Prometheus + Thanos Sidecar pod
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
namespace: monitoring
spec:
replicas: 2 # HA pair
template:
metadata:
labels:
app: prometheus
thanos-store-api: "true"
spec:
containers:
# Prometheus container
- name: prometheus
image: prom/prometheus:v2.53.0
args:
- --config.file=/etc/prometheus/prometheus.yml
- --storage.tsdb.path=/prometheus
- --storage.tsdb.retention.time=48h # Short local retention
- --storage.tsdb.min-block-duration=2h # Required for Thanos
- --storage.tsdb.max-block-duration=2h # Required for Thanos
- --web.enable-lifecycle
- --web.enable-admin-api
ports:
- containerPort: 9090
volumeMounts:
- name: storage
mountPath: /prometheus
- name: config
mountPath: /etc/prometheus
# Thanos Sidecar container
- name: thanos-sidecar
image: quay.io/thanos/thanos:v0.35.1
args:
- sidecar
- --tsdb.path=/prometheus
- --prometheus.url=http://localhost:9090
- --objstore.config-file=/etc/thanos/objstore.yaml
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:10902
ports:
- containerPort: 10901
name: grpc
- containerPort: 10902
name: http
volumeMounts:
- name: storage
mountPath: /prometheus
readOnly: false
- name: thanos-config
mountPath: /etc/thanos
Query Layer
--store flags, DNS SRV records (dnssrv+), or file-based SD (--store.sd-files). For Kubernetes, DNS SRV is the cleanest approach — create a headless Service selecting pods with the thanos-store-api: "true" label.
# Headless Service for Store API discovery
apiVersion: v1
kind: Service
metadata:
name: thanos-sidecar
namespace: monitoring
spec:
type: ClusterIP
clusterIP: None
ports:
- name: grpc
port: 10901
targetPort: grpc
selector:
thanos-store-api: "true"
Store Gateway Deployment
--index-cache-size and --chunk-pool-size flags control memory allocation. Local disk caches the downloaded index for faster restarts.
Compactor Deployment
Compactor Retention Strategy
| Resolution | Retention | Use Case | Storage Impact |
|---|---|---|---|
| Raw | 90 days | Recent dashboards, debugging | ~1.5 bytes/sample |
| 5-minute | 1 year | Monthly reports, trend analysis | ~1/20th of raw |
| 1-hour | 5 years | Yearly capacity planning | ~1/240th of raw |
Multi-Cluster Architecture
Cross-Cluster Querying
flowchart TD
subgraph US["US-East Cluster"]
PUS[Prometheus HA Pair
+ Sidecar]
end
subgraph EU["EU-West Cluster"]
PEU[Prometheus HA Pair
+ Sidecar]
end
subgraph AP["AP-Southeast Cluster"]
PAP[Prometheus HA Pair
+ Sidecar]
end
subgraph Global["Global Observability (Central)"]
TQ[Thanos Query
Global]
SG[Store Gateway]
TC[Compactor]
end
subgraph OBJ["Object Store"]
S3[(Shared S3 Bucket)]
end
PUS & PEU & PAP -->|"upload blocks"| S3
PUS & PEU & PAP -->|"StoreAPI
(cross-cluster gRPC)"| TQ
SG --> S3
SG -->|"StoreAPI"| TQ
TC --> S3
GF[Grafana] --> TQ
# Prometheus external_labels — MUST be unique per cluster + replica
global:
external_labels:
cluster: us-east-1 # Unique per cluster
region: us-east
__replica__: prom-0 # Unique per HA replica
Deduplication Strategies
# Thanos Query deduplicates by replica labels
# Both HA replicas produce nearly identical data — Query picks one
# Configure replica labels (can specify multiple)
thanos query \
--query.replica-label=__replica__ \
--query.replica-label=prometheus_replica
# Dedup algorithm: for each time window, picks the replica with
# fewer gaps (missing scrapes). Penalty-based selection ensures
# the "healthier" replica wins.
# Partial response handling — if one sidecar is down:
# --query.partial-response (enabled by default)
# Returns available data with a warning header instead of failing
Thanos vs Mimir
Thanos vs Grafana Mimir
| Aspect | Thanos | Grafana Mimir |
|---|---|---|
| Data flow | Sidecar uploads TSDB blocks | remote_write pushes samples |
| Prometheus changes | Add sidecar + disable compaction | Add remote_write config only |
| Recent data path | StoreAPI from sidecar (real-time) | Ingester (near real-time) |
| Multi-tenancy | By external_labels (manual) | Native per-request header |
| Operational model | Distributed components you deploy | Distributed components you deploy |
| HA dedup | Query-time (replica label) | Query-time (replica label) |
| Maturity | CNCF Incubating, proven at scale | Grafana-backed, Cortex successor |
| License | Apache 2.0 | AGPLv3 |
When to Choose Thanos
- You want to keep Prometheus local TSDB as the primary store (sidecar model)
- You need cross-cluster querying over gRPC without a central ingest path
- Apache 2.0 licensing is required
- You prefer block-based object storage over stream-based ingestion
- Downsampling with configurable resolution retention is important
- You’re already familiar with the Thanos ecosystem
Conclusion
- Thanos is additive — it extends Prometheus without replacing any component
- Sidecar is the bridge — uploads blocks and serves real-time data via StoreAPI
- Disable Prometheus compaction — set min/max block duration to 2h when using sidecar
- Compactor is a singleton — never run more than one instance per bucket
- Downsampling saves costs — 5m and 1h resolutions drastically reduce long-term storage
- External labels matter — they’re how Thanos identifies clusters, replicas, and tenants
- Partial responses are ok — better to show available data than fail entirely