Prometheus Deep Dive Part 11: Extending Prometheus with Thanos

Thanos Overview & Philosophy

Thanos is a CNCF Incubating project that extends Prometheus with long-term storage and global querying capabilities. Unlike Mimir or VictoriaMetrics, Thanos doesn’t replace Prometheus — it augments existing Prometheus deployments by:

Uploading TSDB blocks from Prometheus to object storage (S3/GCS/Azure) via a sidecar
Querying across multiple Prometheus instances transparently through a single endpoint
Downsampling historical data automatically (5m and 1h resolutions)
Deduplicating HA pair data at query time

                            
                            Core Philosophy: Thanos treats Prometheus as the source of truth for recent data. It layers on top of existing Prometheus instances without requiring changes to your scrape configuration, alerting rules, or operational procedures. Prometheus continues to work exactly as before — Thanos adds capabilities around it.
                        

Architecture

Thanos Component Architecture

flowchart TD
    subgraph Cluster1["Cluster: US-East"]
        P1[Prometheus + Sidecar]
        P2[Prometheus + Sidecar]
    end

    subgraph Cluster2["Cluster: EU-West"]
        P3[Prometheus + Sidecar]
        P4[Prometheus + Sidecar]
    end

    subgraph Thanos["Thanos Global Layer"]
        TQ[Thanos Query]
        SG[Store Gateway]
        TC[Compactor]
        TR[Ruler]
    end

    subgraph Storage["Object Store"]
        S3[(S3 Bucket)]
    end

    P1 & P2 -->|"upload blocks"| S3
    P3 & P4 -->|"upload blocks"| S3
    P1 & P2 & P3 & P4 -->|"StoreAPI gRPC"| TQ
    SG -->|"serves historical"| TQ
    SG --> S3
    TC -->|"compact + downsample"| S3
    TR --> TQ
    GF[Grafana] --> TQ

Thanos Sidecar

The Sidecar runs alongside each Prometheus instance as a container in the same pod. It has two responsibilities:

Block Upload: Watches Prometheus’s data directory and uploads completed TSDB blocks to object storage
StoreAPI: Exposes a gRPC StoreAPI endpoint that Thanos Query can connect to for real-time data from Prometheus’s head block

# Thanos Sidecar container — added to Prometheus pod
containers:
  - name: thanos-sidecar
    image: quay.io/thanos/thanos:v0.35.1
    args:
      - sidecar
      - --tsdb.path=/prometheus
      - --prometheus.url=http://localhost:9090
      - --objstore.config-file=/etc/thanos/objstore.yaml
      - --grpc-address=0.0.0.0:10901
      - --http-address=0.0.0.0:10902
      # Ship blocks every 2 hours (matches Prometheus min-block-duration)
      - --shipper.upload-compacted
    ports:
      - containerPort: 10901
        name: grpc
      - containerPort: 10902
        name: http
    volumeMounts:
      - name: prometheus-storage
        mountPath: /prometheus
      - name: thanos-objstore-config
        mountPath: /etc/thanos
    resources:
      requests:
        cpu: 100m
        memory: 256Mi
      limits:
        memory: 512Mi

                            
                            Critical Requirement: When using the sidecar, Prometheus MUST have --storage.tsdb.min-block-duration=2h and --storage.tsdb.max-block-duration=2h set to the same value. This disables Prometheus’s internal compaction and allows Thanos Compactor to handle it instead. Without this, both Prometheus and Thanos try to compact, causing data corruption.
                        

Thanos Query

Thanos Query implements the Prometheus HTTP API and fans out queries to multiple StoreAPI endpoints (sidecars, store gateways, other query instances). It merges results, deduplicates HA replicas, and returns a unified response:

# Thanos Query deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: thanos-query
  namespace: monitoring
spec:
  replicas: 2
  template:
    spec:
      containers:
        - name: thanos-query
          image: quay.io/thanos/thanos:v0.35.1
          args:
            - query
            - --log.level=info
            - --query.replica-label=__replica__
            - --query.replica-label=prometheus_replica
            - --query.auto-downsampling
            - --query.max-concurrent=20
            - --query.timeout=2m
            # Discover stores via DNS
            - --store=dnssrv+_grpc._tcp.thanos-sidecar.monitoring.svc
            - --store=dnssrv+_grpc._tcp.thanos-store-gateway.monitoring.svc
            # Or static endpoints
            - --store=thanos-sidecar-us-east:10901
            - --store=thanos-sidecar-eu-west:10901
            - --store=thanos-store-gateway:10901
          ports:
            - containerPort: 10902
              name: http
            - containerPort: 10901
              name: grpc

Store Gateway

The Store Gateway serves historical TSDB blocks from object storage. It indexes block metadata and serves queries against them through the StoreAPI. It caches index data locally for fast lookups:

# Thanos Store Gateway — serves historical data from object store
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: thanos-store-gateway
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: thanos-store
          image: quay.io/thanos/thanos:v0.35.1
          args:
            - store
            - --data-dir=/data
            - --objstore.config-file=/etc/thanos/objstore.yaml
            - --index-cache-size=2GB
            - --chunk-pool-size=4GB
            - --grpc-address=0.0.0.0:10901
            - --http-address=0.0.0.0:10902
            # Time-based partitioning (optional for very large stores)
            # - --min-time=-720h  # Only serve last 30 days
            # - --max-time=-2h    # Don't serve very recent (sidecar handles it)
          volumeMounts:
            - name: data
              mountPath: /data
            - name: objstore-config
              mountPath: /etc/thanos
          resources:
            requests:
              cpu: "1"
              memory: 4Gi
            limits:
              memory: 8Gi
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 50Gi    # For index cache

Compactor & Downsampling

The Compactor runs as a singleton, continuously processing blocks in object storage:

Compaction: Merges small 2h blocks into larger blocks (up to the configured max), reducing object count and improving query performance
Downsampling: Creates 5-minute and 1-hour resolution versions of data older than configured thresholds
Retention: Deletes blocks exceeding the retention period

# Thanos Compactor — singleton deployment
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: thanos-compactor
spec:
  replicas: 1    # MUST be 1 — singleton
  template:
    spec:
      containers:
        - name: thanos-compact
          image: quay.io/thanos/thanos:v0.35.1
          args:
            - compact
            - --data-dir=/data
            - --objstore.config-file=/etc/thanos/objstore.yaml
            - --http-address=0.0.0.0:10902
            - --wait                    # Run continuously (not one-shot)
            - --wait-interval=5m
            # Retention configuration
            - --retention.resolution-raw=90d      # Keep raw data 90 days
            - --retention.resolution-5m=365d      # Keep 5m downsampled 1 year
            - --retention.resolution-1h=1825d     # Keep 1h downsampled 5 years
            # Downsampling
            - --downsample.concurrency=4
            # Compaction
            - --compact.concurrency=2
          volumeMounts:
            - name: data
              mountPath: /data
            - name: objstore-config
              mountPath: /etc/thanos
          resources:
            requests:
              cpu: "2"
              memory: 4Gi
            limits:
              memory: 8Gi
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 100Gi    # Scratch space for compaction

Thanos Ruler

Thanos Ruler evaluates recording and alerting rules against the global Thanos Query endpoint, enabling rules that span multiple Prometheus instances:

# Thanos Ruler — evaluates rules against global view
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: thanos-ruler
spec:
  replicas: 2    # HA pair
  template:
    spec:
      containers:
        - name: thanos-rule
          image: quay.io/thanos/thanos:v0.35.1
          args:
            - rule
            - --data-dir=/data
            - --objstore.config-file=/etc/thanos/objstore.yaml
            - --rule-file=/etc/thanos-rules/*.yaml
            - --query=dnssrv+_http._tcp.thanos-query.monitoring.svc
            - --alertmanagers.url=http://alertmanager:9093
            - --alert.label-drop=__replica__
            - --label=ruler_cluster="global"
            - --grpc-address=0.0.0.0:10901
            - --http-address=0.0.0.0:10902

Production Deployment

Object Store Configuration

# objstore.yaml — S3 configuration
type: S3
config:
  bucket: thanos-metrics-prod
  endpoint: s3.us-east-1.amazonaws.com
  region: us-east-1
  # Use IAM role (IRSA) rather than static credentials
  # access_key: ""
  # secret_key: ""
  insecure: false
  signature_version2: false
  http_config:
    idle_conn_timeout: 90s
    response_header_timeout: 2m
    tls_config:
      insecure_skip_verify: false

# GCS configuration alternative
type: GCS
config:
  bucket: thanos-metrics-prod
  # Uses workload identity or GOOGLE_APPLICATION_CREDENTIALS
  service_account: ""

Sidecar Deployment

# Complete Prometheus + Thanos Sidecar pod
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: prometheus
  namespace: monitoring
spec:
  replicas: 2    # HA pair
  template:
    metadata:
      labels:
        app: prometheus
        thanos-store-api: "true"
    spec:
      containers:
        # Prometheus container
        - name: prometheus
          image: prom/prometheus:v2.53.0
          args:
            - --config.file=/etc/prometheus/prometheus.yml
            - --storage.tsdb.path=/prometheus
            - --storage.tsdb.retention.time=48h     # Short local retention
            - --storage.tsdb.min-block-duration=2h  # Required for Thanos
            - --storage.tsdb.max-block-duration=2h  # Required for Thanos
            - --web.enable-lifecycle
            - --web.enable-admin-api
          ports:
            - containerPort: 9090
          volumeMounts:
            - name: storage
              mountPath: /prometheus
            - name: config
              mountPath: /etc/prometheus

        # Thanos Sidecar container
        - name: thanos-sidecar
          image: quay.io/thanos/thanos:v0.35.1
          args:
            - sidecar
            - --tsdb.path=/prometheus
            - --prometheus.url=http://localhost:9090
            - --objstore.config-file=/etc/thanos/objstore.yaml
            - --grpc-address=0.0.0.0:10901
            - --http-address=0.0.0.0:10902
          ports:
            - containerPort: 10901
              name: grpc
            - containerPort: 10902
              name: http
          volumeMounts:
            - name: storage
              mountPath: /prometheus
              readOnly: false
            - name: thanos-config
              mountPath: /etc/thanos

Query Layer

                            
                            Query Fanout: Thanos Query discovers stores via three mechanisms: static --store flags, DNS SRV records (dnssrv+), or file-based SD (--store.sd-files). For Kubernetes, DNS SRV is the cleanest approach — create a headless Service selecting pods with the thanos-store-api: "true" label.
                        

# Headless Service for Store API discovery
apiVersion: v1
kind: Service
metadata:
  name: thanos-sidecar
  namespace: monitoring
spec:
  type: ClusterIP
  clusterIP: None
  ports:
    - name: grpc
      port: 10901
      targetPort: grpc
  selector:
    thanos-store-api: "true"

Store Gateway Deployment

                            
                            Store Gateway Sizing: Memory is proportional to the index size, not data size. With 10M unique series in object storage, expect ~2–4 GiB memory for index caching. The --index-cache-size and --chunk-pool-size flags control memory allocation. Local disk caches the downloaded index for faster restarts.
                        

Compactor Deployment

Operations

Compactor Retention Strategy

Resolution	Retention	Use Case	Storage Impact
Raw	90 days	Recent dashboards, debugging	~1.5 bytes/sample
5-minute	1 year	Monthly reports, trend analysis	~1/20th of raw
1-hour	5 years	Yearly capacity planning	~1/240th of raw

RetentionCost

Multi-Cluster Architecture

Cross-Cluster Querying

Multi-Cluster Thanos Architecture

flowchart TD
    subgraph US["US-East Cluster"]
        PUS[Prometheus HA Pair
+ Sidecar]
    end

    subgraph EU["EU-West Cluster"]
        PEU[Prometheus HA Pair
+ Sidecar]
    end

    subgraph AP["AP-Southeast Cluster"]
        PAP[Prometheus HA Pair
+ Sidecar]
    end

    subgraph Global["Global Observability (Central)"]
        TQ[Thanos Query
Global]
        SG[Store Gateway]
        TC[Compactor]
    end

    subgraph OBJ["Object Store"]
        S3[(Shared S3 Bucket)]
    end

    PUS & PEU & PAP -->|"upload blocks"| S3
    PUS & PEU & PAP -->|"StoreAPI
(cross-cluster gRPC)"| TQ
    SG --> S3
    SG -->|"StoreAPI"| TQ
    TC --> S3
    GF[Grafana] --> TQ

# Prometheus external_labels — MUST be unique per cluster + replica
global:
  external_labels:
    cluster: us-east-1       # Unique per cluster
    region: us-east
    __replica__: prom-0      # Unique per HA replica

Deduplication Strategies

# Thanos Query deduplicates by replica labels
# Both HA replicas produce nearly identical data — Query picks one

# Configure replica labels (can specify multiple)
thanos query \
  --query.replica-label=__replica__ \
  --query.replica-label=prometheus_replica

# Dedup algorithm: for each time window, picks the replica with
# fewer gaps (missing scrapes). Penalty-based selection ensures
# the "healthier" replica wins.

# Partial response handling — if one sidecar is down:
# --query.partial-response   (enabled by default)
# Returns available data with a warning header instead of failing

Thanos vs Mimir

Decision Guide

Thanos vs Grafana Mimir

Aspect	Thanos	Grafana Mimir
Data flow	Sidecar uploads TSDB blocks	remote_write pushes samples
Prometheus changes	Add sidecar + disable compaction	Add remote_write config only
Recent data path	StoreAPI from sidecar (real-time)	Ingester (near real-time)
Multi-tenancy	By external_labels (manual)	Native per-request header
Operational model	Distributed components you deploy	Distributed components you deploy
HA dedup	Query-time (replica label)	Query-time (replica label)
Maturity	CNCF Incubating, proven at scale	Grafana-backed, Cortex successor
License	Apache 2.0	AGPLv3

ArchitectureDecision

When to Choose Thanos

                            
                            Choose Thanos when:
                            You want to keep Prometheus local TSDB as the primary store (sidecar model)
You need cross-cluster querying over gRPC without a central ingest path
Apache 2.0 licensing is required
You prefer block-based object storage over stream-based ingestion
Downsampling with configurable resolution retention is important
You’re already familiar with the Thanos ecosystem

                        

Conclusion

                            
                            Key Takeaways:
                            Thanos is additive — it extends Prometheus without replacing any component
Sidecar is the bridge — uploads blocks and serves real-time data via StoreAPI
Disable Prometheus compaction — set min/max block duration to 2h when using sidecar
Compactor is a singleton — never run more than one instance per bucket
Downsampling saves costs — 5m and 1h resolutions drastically reduce long-term storage
External labels matter — they’re how Thanos identifies clusters, replicas, and tenants
Partial responses are ok — better to show available data than fail entirely

                        

Previous Part 10: Remote Storage Next Part 12: Jsonnet & Mixins