Back to Monitoring & Observability Series

Prometheus Deep Dive Part 10: Remote Storage Systems

June 15, 2026 Wasil Zafar 32 min read

Prometheus’s local TSDB is limited to a single server’s disk. Remote storage systems — particularly Grafana Mimir and VictoriaMetrics — extend Prometheus with virtually unlimited retention, global query views, multi-tenancy, and horizontal scalability. Learn when to adopt each and how to configure production-grade remote write pipelines.

Table of Contents

  1. Remote Storage Overview
  2. Grafana Mimir
  3. VictoriaMetrics
  4. Remote Write Configuration
  5. Mimir vs VictoriaMetrics
  6. Conclusion

Remote Storage Overview

Why Remote Storage?

Prometheus’s local TSDB is optimized for recent data queries. It excels at last-few-hours dashboards but has inherent limitations for enterprise use cases:

Limitations of Local Storage:
  • Single-node capacity — limited by one server’s disk and memory
  • No global view — each Prometheus instance sees only its own data
  • No multi-tenancy — cannot isolate teams’ data or apply per-tenant limits
  • Retention limited — practically 30–90 days before disk/compaction issues
  • No deduplication — HA pairs write duplicate data locally
  • Backup complexity — snapshots are large and node-specific

Remote Write vs Remote Read

Remote Write vs Remote Read Data Flow
flowchart LR
    subgraph Prometheus
        TSDB[Local TSDB]
        WAL[WAL]
    end

    subgraph RemoteWrite["Remote Write (push)"]
        RW["Samples pushed
as they're ingested"] end subgraph RemoteRead["Remote Read (pull)"] RR["Queries proxied
at query time"] end subgraph LTS["Long-Term Store"] MIMIR[Grafana Mimir] end WAL -->|"continuous push"| RW --> MIMIR TSDB -.->|"query-time proxy"| RR -.-> MIMIR
Comparison

Remote Write vs Remote Read

AspectRemote WriteRemote Read
DirectionPrometheus pushes to backendPrometheus queries backend at query time
LatencyNear real-time (seconds)Adds query latency (network round-trip)
Data scopeAll ingested samples (or filtered)Only what’s queried
Local retentionCan reduce to hours (data is in backend)Still need local data for recent queries
Query pathQuery backend directly (Grafana→Mimir)Query Prometheus (merges local + remote)
RecommendedYes — standard patternRarely — adds complexity with little benefit
ArchitectureData Flow

Solution Landscape

Landscape

Prometheus-Compatible Remote Storage Solutions

SolutionLicenseBacked ByKey Differentiator
Grafana MimirAGPLv3Grafana LabsMulti-tenant, object-store native, Cortex successor
VictoriaMetricsApache 2.0VictoriaMetrics IncHigh compression, MetricsQL, simple operations
ThanosApache 2.0CNCFSidecar model, uses existing Prometheus TSDB blocks
CortexApache 2.0CNCF (archived)Mimir’s predecessor — use Mimir instead
M3DBApache 2.0UberOriginally for Uber scale; declining community
EcosystemComparison

Grafana Mimir

Architecture & Components

Grafana Mimir Architecture
flowchart TD
    subgraph Write["Write Path"]
        DIST[Distributor]
        ING[Ingester
x3 replicas] end subgraph Read["Read Path"] QF[Query Frontend] QS[Query Scheduler] Q[Querier] end subgraph Storage["Storage"] OBJ[(Object Store
S3/GCS/Azure)] SC[Store Gateway] end subgraph Ops["Operations"] COMP[Compactor] RUL[Ruler] end P[Prometheus
remote_write] -->|"push"| DIST DIST -->|"hash ring"| ING ING -->|"flush blocks"| OBJ GF[Grafana] --> QF --> QS --> Q Q --> ING Q --> SC --> OBJ COMP --> OBJ RUL --> Q

Deployment (Monolithic & Microservices)

# Mimir monolithic mode — simplest deployment for <1M active series
# All components in a single binary
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mimir
  namespace: monitoring
spec:
  replicas: 3
  serviceName: mimir
  template:
    spec:
      containers:
        - name: mimir
          image: grafana/mimir:2.13.0
          args:
            - -target=all
            - -config.file=/etc/mimir/mimir.yaml
          ports:
            - containerPort: 8080    # HTTP API
            - containerPort: 9095    # gRPC (internal)
            - containerPort: 7946    # Memberlist gossip
          volumeMounts:
            - name: config
              mountPath: /etc/mimir
            - name: data
              mountPath: /data
          resources:
            requests:
              cpu: "2"
              memory: 8Gi
            limits:
              memory: 12Gi
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 50Gi
# mimir.yaml — monolithic configuration
multitenancy_enabled: true

server:
  http_listen_port: 8080
  grpc_listen_port: 9095

distributor:
  ring:
    kvstore:
      store: memberlist

ingester:
  ring:
    kvstore:
      store: memberlist
    replication_factor: 3

blocks_storage:
  backend: s3
  s3:
    endpoint: s3.us-east-1.amazonaws.com
    bucket_name: mimir-blocks-prod
    region: us-east-1
  tsdb:
    dir: /data/tsdb
  bucket_store:
    sync_dir: /data/tsdb-sync

compactor:
  data_dir: /data/compactor
  sharding_ring:
    kvstore:
      store: memberlist

store_gateway:
  sharding_ring:
    kvstore:
      store: memberlist

ruler:
  alertmanager_url: http://alertmanager:9093
  rule_path: /data/rules

limits:
  # Per-tenant limits
  ingestion_rate: 100000                    # samples/sec per tenant
  ingestion_burst_size: 200000
  max_global_series_per_user: 5000000       # 5M series per tenant
  max_global_series_per_metric: 100000
  compactor_blocks_retention_period: 365d   # 1 year retention

memberlist:
  join_members:
    - mimir-0.mimir:7946
    - mimir-1.mimir:7946
    - mimir-2.mimir:7946

Multi-Tenancy

# Prometheus remote_write with tenant header
remote_write:
  - url: https://mimir.internal/api/v1/push
    headers:
      X-Scope-OrgID: payments-team    # Tenant identifier
    queue_config:
      max_samples_per_send: 2000
      batch_send_deadline: 5s

# Grafana datasource configuration per tenant
# Each team queries only their own data
apiVersion: 1
datasources:
  - name: Mimir-Payments
    type: prometheus
    url: https://mimir.internal/prometheus
    jsonData:
      httpHeaderName1: X-Scope-OrgID
    secureJsonData:
      httpHeaderValue1: payments-team

Configuration Deep Dive

# Per-tenant overrides (runtime configuration)
# File: /etc/mimir/runtime.yaml — hot-reloaded without restart
overrides:
  # Default limits for all tenants
  __default__:
    ingestion_rate: 50000
    max_global_series_per_user: 2000000
    compactor_blocks_retention_period: 90d

  # Override for high-volume tenant
  platform-team:
    ingestion_rate: 200000
    max_global_series_per_user: 10000000
    compactor_blocks_retention_period: 365d

  # Restricted tenant (dev environment)
  dev-team:
    ingestion_rate: 10000
    max_global_series_per_user: 500000
    compactor_blocks_retention_period: 14d

VictoriaMetrics

Architecture (Single & Cluster)

VictoriaMetrics Cluster Architecture
flowchart TD
    subgraph Insert["Insert Path"]
        VMI[vminsert
Stateless] end subgraph Store["Storage"] VMS1[vmstorage-0] VMS2[vmstorage-1] VMS3[vmstorage-2] end subgraph Query["Query Path"] VMSL[vmselect
Stateless] end P[Prometheus
remote_write] --> VMI VMI -->|"consistent hashing"| VMS1 & VMS2 & VMS3 GF[Grafana] --> VMSL VMSL --> VMS1 & VMS2 & VMS3

Deployment

# VictoriaMetrics single-node — handles up to 10M active series
# Simplest possible deployment
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: victoriametrics
  namespace: monitoring
spec:
  replicas: 1
  template:
    spec:
      containers:
        - name: victoriametrics
          image: victoriametrics/victoria-metrics:v1.101.0
          args:
            - -storageDataPath=/storage
            - -retentionPeriod=12     # 12 months
            - -httpListenAddr=:8428
            - -search.maxUniqueTimeseries=10000000
            - -dedup.minScrapeInterval=15s    # Dedup HA pairs
          ports:
            - containerPort: 8428
          volumeMounts:
            - name: storage
              mountPath: /storage
          resources:
            requests:
              cpu: "4"
              memory: 16Gi
  volumeClaimTemplates:
    - metadata:
        name: storage
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 500Gi
# Prometheus remote_write to VictoriaMetrics
remote_write:
  - url: http://victoriametrics:8428/api/v1/write
    queue_config:
      max_samples_per_send: 10000
      capacity: 20000
      max_shards: 30

Unique Features (MetricsQL, Downsampling)

# MetricsQL — VictoriaMetrics' PromQL superset
# Supports additional functions not in standard PromQL

# Range over range — rate calculation that handles counter resets better
rate(http_requests_total[5m])        # Standard PromQL
increase(http_requests_total[5m])     # VM handles resets more accurately

# Rollup functions with explicit window
rollup_rate(http_requests_total, 5m)

# Label manipulation
label_set(metric, "env", "prod")
label_del(metric, "instance")
label_copy(metric, "pod", "pod_name")

# Running aggregations
running_sum(rate(requests_total[1m])[1h:])
running_avg(cpu_usage[1h:5m])

# Built-in downsampling at query time
# No pre-configuration needed — VM automatically uses appropriate resolution
# Query a year of data efficiently:
avg_over_time(cpu_usage[1y:1h])    # 1-hour step over 1 year

Remote Write Configuration

Queue Tuning & Reliability

# Production remote_write configuration
remote_write:
  - url: https://mimir.internal/api/v1/push
    headers:
      X-Scope-OrgID: production

    # Queue tuning for high-volume (>500K samples/sec)
    queue_config:
      capacity: 10000              # Per-shard buffer (default: 2500)
      max_shards: 50               # Max parallel writes (default: 200)
      min_shards: 10               # Min parallel writes (faster startup)
      max_samples_per_send: 5000   # Batch size (default: 2000)
      batch_send_deadline: 5s      # Max wait before partial batch send
      min_backoff: 30ms            # Initial retry backoff
      max_backoff: 5s              # Max retry backoff
      retry_on_http_429: true      # Retry on rate limiting

    # Metadata configuration
    metadata_config:
      send: true
      send_interval: 5m

    # TLS for encrypted transport
    tls_config:
      cert_file: /etc/certs/client.crt
      key_file: /etc/certs/client.key
      ca_file: /etc/certs/ca.crt

Write-Path Relabeling

# Selective remote write — reduce cost by filtering
remote_write:
  - url: https://mimir.internal/api/v1/push
    write_relabel_configs:
      # Only send recording rules and critical raw metrics
      - source_labels: [__name__]
        regex: '(namespace|job|cluster|slo):.*'
        action: keep

      # Drop high-cardinality debug metrics
      - source_labels: [__name__]
        regex: 'go_(gc|memstats)_.*'
        action: drop

      # Remove labels that only matter locally
      - regex: '__replica__|prometheus_replica'
        action: labeldrop

Exemplars Support

# Enable exemplars in remote write (links metrics to traces)
remote_write:
  - url: https://mimir.internal/api/v1/push
    send_exemplars: true    # Forward exemplars to backend

# Mimir configuration to accept exemplars
limits:
  max_exemplars_per_user: 100000    # Per-tenant exemplar limit

Mimir vs VictoriaMetrics

Feature Comparison

Comparison

Grafana Mimir vs VictoriaMetrics

FeatureGrafana MimirVictoriaMetrics
Multi-tenancyNative (per-request header)Enterprise only (or separate instances)
Storage backendObject store (S3/GCS/Azure)Local disk (cluster mode distributes)
Query languageStandard PromQLMetricsQL (PromQL superset)
HA deduplicationQuery-time (via replica label)Write-time (dedup.minScrapeInterval)
DownsamplingCompactor-driven (5m, 1h)Query-time (automatic)
Compression ratio~1.5 bytes/sample~0.7 bytes/sample (industry best)
Min viable deployment3 replicas (monolithic)1 instance (single-node)
Grafana integrationNative (same company)Full PromQL datasource compatible
LicenseAGPLv3Apache 2.0 (enterprise features paid)
DecisionArchitecture

Operational Comparison

Operational Reality:
  • Mimir: More components to manage but scales to billions of series. Requires object store (S3/GCS). Better for large organizations with multi-tenant requirements.
  • VictoriaMetrics: Operationally simpler (single binary possible). Better compression means less storage cost. Better for single-tenant or small-team deployments where simplicity matters.

Decision Guide

Choose Grafana Mimir when:
  • You need native multi-tenancy with per-tenant limits
  • You already use Grafana Cloud or the LGTM stack
  • Object storage (S3/GCS) is your preferred backend
  • You need built-in alerting rules evaluation (Ruler component)
  • Scale exceeds 100M+ active series across many teams
Choose VictoriaMetrics when:
  • Operational simplicity is the top priority
  • Storage cost optimization matters (best compression)
  • Single-tenant or separated-instance multi-tenancy is acceptable
  • You want MetricsQL’s extended query capabilities
  • Local disk is preferred over object store
  • Faster query performance on single-node deployments

Conclusion

Key Takeaways:
  • Remote write is the standard — use it for all production Prometheus deployments
  • Filter at write time — use write_relabel_configs to reduce storage costs
  • Mimir for multi-tenant enterprises — native isolation, object store, Grafana ecosystem
  • VictoriaMetrics for simplicity — best compression, single binary, MetricsQL
  • Thanos for existing deployments — if you already have TSDB blocks on disk (covered in Part 11)
  • Monitor your remote write — track pending samples, lag, and failed sends