Grafana Deep Dive Part 6: Tracing with Grafana Tempo & TraceQL

Introducing Tempo & TraceQL

Distributed tracing is the observability signal that reveals how a request flows through your system — which services it touches, where latency accumulates, and where errors originate. While metrics tell you what is happening and logs tell you why, traces tell you the complete journey of every request across service boundaries.

Grafana Tempo is an open-source, high-volume distributed tracing backend that stores and queries traces with minimal operational complexity. Combined with TraceQL — Tempo’s purpose-built query language — it provides a powerful system for understanding request latency, error propagation, and service dependencies in microservices architectures.

What Is Grafana Tempo?

Tempo is a horizontally-scalable distributed tracing backend designed for high-throughput trace ingestion and low-cost long-term storage. Unlike traditional tracing systems (Jaeger with Elasticsearch, Zipkin with Cassandra), Tempo requires only object storage (S3, GCS, Azure Blob) as its backend — no additional databases, no index nodes, no complex cluster management.

                            
                            Key Insight: Tempo’s design philosophy is radically simple: store all traces in cheap object storage without indexing, then use TraceQL and service graphs to find interesting traces. This eliminates the operational burden of maintaining sampling-based indexes and allows teams to store 100% of traces at a fraction of the cost.
                        

Tempo accepts traces in multiple formats — OpenTelemetry (OTLP), Jaeger, Zipkin, and OpenCensus — making it a drop-in backend regardless of your existing instrumentation. It integrates deeply with Grafana for visualization, providing flame graphs, service graphs, and seamless cross-signal navigation.

The No-Index Design

Traditional tracing backends index every span attribute to enable search. This creates enormous operational overhead: index storage often exceeds trace data itself, write amplification degrades performance, and teams are forced to sample aggressively (keeping only 1–5% of traces) to control costs.

Tempo takes a fundamentally different approach:

No per-span indexing — Traces are stored as-is in columnar Parquet blocks in object storage
Trace ID lookup — Finding a trace by ID is a direct object storage read (O(1) per block)
TraceQL search — Columnar format enables efficient scanning without traditional indexes
Bloom filters — Optional probabilistic data structures accelerate trace ID lookups across blocks
Dedicated attribute columns — Frequently-queried attributes can be promoted to dedicated columns for faster search

The Parquet columnar format is crucial: when TraceQL queries filter on span.http.status_code, Tempo reads only the status_code column from each block — not the entire span. This makes full-text-search-style queries feasible over object storage without traditional indexes.

Cost Benefits at Scale

Cost Comparison Tracing 10,000 spans/second, 30-day retention

Traditional Backend (Jaeger + Elasticsearch)

Elasticsearch cluster: 3 data nodes × 500GB SSD = $2,400/month
Index overhead: ~60% of data volume goes to indexing
Forced sampling at 5% to control costs — losing 95% of traces
Operational burden: index management, shard rebalancing, version upgrades

Tempo with Object Storage

Object storage (S3): ~2.5 TB compressed at $0.023/GB = $58/month
No index overhead — pure trace data
Store 100% of traces — no sampling required
Operational burden: configure bucket lifecycle policies

40x Cost Reduction 100% Trace Retention Minimal Ops

The cost difference becomes even more dramatic at higher volumes. Organizations ingesting 100,000+ spans/second can save hundreds of thousands of dollars annually by switching from indexed tracing backends to Tempo’s object-storage-only architecture.

Exploring Tempo Features

Beyond basic trace storage and retrieval, Tempo provides several features that transform raw traces into actionable observability insights.

Trace Discovery

Finding interesting traces without indexes requires smart discovery mechanisms. Tempo provides multiple paths to discover traces:

TraceQL search — Query spans by attributes, duration, status, and structural relationships
Service graph — Visual topology map showing service-to-service call patterns and error rates
Span metrics — Pre-computed RED metrics (Rate, Errors, Duration) generated from traces, queryable via PromQL
Exemplars — Clickable trace links embedded in metric time-series panels
Log correlation — Jump from a log line containing a trace_id directly to the full trace

The combination of these mechanisms means you rarely need to "search" for traces in the traditional sense. Instead, you navigate to traces from other signals — a latency spike in a metric panel leads to an exemplar, which opens the offending trace.

Service Graphs

Tempo’s metrics-generator component analyzes incoming spans to build a real-time topology of your service mesh. The service graph shows:

Nodes — Each service, sized by request volume
Edges — Service-to-service calls, colored by error rate
Latency — P50/P95/P99 latency on each edge
Request rate — Calls per second between services

Service graphs are stored as Prometheus metrics (using the traces_service_graph_* metric family), meaning you can alert on topology changes, build dashboards showing dependency health, and detect when new services appear or existing connections break.

# Tempo configuration — enable metrics generator for service graphs
metrics_generator:
  processor:
    service_graphs:
      enabled: true
      dimensions:
        - http.method
        - http.status_code
      peer_attributes:
        - db.system
        - messaging.system
      enable_client_server_prefix: true
      # Wait time before considering a span complete
      wait: 10s
      # Maximum items in the store
      max_items: 10000
  storage:
    path: /var/tempo/generator/wal
    remote_write:
      - url: http://mimir:9009/api/v1/push

Span Metrics

Span metrics transform trace data into time-series metrics — giving you the best of both worlds. For every span matching your configured dimensions, Tempo generates:

traces_spanmetrics_latency_bucket — Histogram of span durations
traces_spanmetrics_calls_total — Counter of span completions (rate = throughput)
traces_spanmetrics_size_total — Counter of span sizes in bytes

# Span metrics configuration
metrics_generator:
  processor:
    span_metrics:
      enabled: true
      dimensions:
        - service.name
        - span.name
        - http.method
        - http.status_code
        - db.system
      # Enable exemplars for linking back to traces
      enable_target_info: true
      # Histogram buckets for latency
      histogram_buckets: [0.002, 0.004, 0.008, 0.016, 0.032, 0.064, 0.128, 0.256, 0.512, 1.024, 2.048, 4.096, 8.192, 16.384]

This means you can build PromQL dashboards and alerts from trace data — detecting latency regressions, error rate spikes, and throughput changes — without separate application-level metrics instrumentation.

Cross-Signal Links

Tempo supports bidirectional navigation between all observability signals:

Trace-to-Logs — From any span, jump to correlated logs in Loki filtered by trace_id and time range
Trace-to-Metrics — From a span, see the corresponding service metrics at that exact timestamp
Trace-to-Profiles — From a slow span, jump to a continuous profile in Pyroscope showing CPU/memory usage during that span’s execution
Logs-to-Traces — From a log line containing a trace_id field, jump directly to the full trace
Metrics-to-Traces — From a metric panel with exemplars enabled, click a data point to see the trace that produced it

# Grafana datasource configuration for cross-signal correlation
datasources:
  - name: Tempo
    type: tempo
    url: http://tempo:3200
    jsonData:
      tracesToLogsV2:
        datasourceUid: loki
        spanStartTimeShift: "-1h"
        spanEndTimeShift: "1h"
        filterByTraceID: true
        filterBySpanID: false
        customQuery: true
        query: '{$${__tags}} | trace_id="$${__trace.traceId}"'
      tracesToMetrics:
        datasourceUid: mimir
        spanStartTimeShift: "-5m"
        spanEndTimeShift: "5m"
        tags:
          - key: service.name
            value: service
        queries:
          - name: "Request Rate"
            query: "sum(rate(http_server_request_duration_seconds_count{service=\"$${__tags.service.name}\"}[5m]))"
          - name: "Error Rate"
            query: "sum(rate(http_server_request_duration_seconds_count{service=\"$${__tags.service.name}\",http_status_code=~\"5..\"}[5m]))"
      tracesToProfiles:
        datasourceUid: pyroscope
        profileTypeId: "process_cpu:cpu:nanoseconds:cpu:nanoseconds"
        customQuery: true
        query: '{service_name="$${__tags.service.name}"}'
      serviceMap:
        datasourceUid: mimir
      nodeGraph:
        enabled: true

The TraceQL Query Language

TraceQL is a purpose-built query language for searching and analyzing distributed traces. Unlike traditional trace search (filter by service name, operation, tags), TraceQL enables structural queries — finding traces based on relationships between spans, not just individual span attributes.

Span Selectors & Filters

The basic building block of TraceQL is the spanset selector — a set of conditions that match spans within traces:

# Basic span selector — find spans from a specific service
{ resource.service.name = "api-gateway" }

# Attribute filter — spans with specific HTTP method
{ span.http.method = "POST" }

# Status filter — find error spans
{ status = error }

# Duration filter — find slow spans (> 500ms)
{ duration > 500ms }

# Combine multiple conditions (AND within a spanset)
{ resource.service.name = "checkout-service" && span.http.status_code >= 500 && duration > 1s }

# String matching with regex
{ span.http.url =~ "/api/v[12]/users.*" }

# Not-equal operator
{ resource.service.name != "health-check" }

# Exists check — span has this attribute
{ span.db.system != nil }

# Numeric range
{ span.http.response_content_length > 1000000 }

                            
                            Attribute Scopes: TraceQL distinguishes between resource.* attributes (service-level, shared across all spans from a service), span.* attributes (individual span metadata), and unscoped attributes (searched in both). Always scope attributes explicitly for best performance — resource.service.name is faster than unscoped service.name.
                        

Structural Operators

TraceQL’s most powerful feature is structural operators — querying traces based on the relationships between spans. This is what sets TraceQL apart from simple tag-based search:

# Parent-child (>>) — find traces where api-gateway directly calls a slow DB query
{ resource.service.name = "api-gateway" } >> { span.db.system = "postgresql" && duration > 200ms }

# Descendant (~) — find traces where api-gateway eventually reaches a slow DB query
# (even through intermediate services)
{ resource.service.name = "api-gateway" } ~ { span.db.system = "postgresql" && duration > 200ms }

# Sibling (&&) — find traces containing BOTH patterns (spans in same trace, not necessarily related)
{ resource.service.name = "payment-service" && status = error } && { resource.service.name = "inventory-service" && status = error }

# Negation (!) — find traces where gateway calls exist but NO database spans are present
{ resource.service.name = "api-gateway" } && !{ span.db.system != nil }

# Combining structural and duration — find traces where a checkout service
# call descends into a slow external HTTP call
{ resource.service.name = "checkout" && span.name = "POST /order" } ~
{ span.http.url =~ "https://external-payment.*" && duration > 2s }

The structural operators enable queries that would be impossible with flat tag-based search:

>> (child) — The right spanset must be a direct child of the left spanset
~ (descendant) — The right spanset must be a descendant (any depth) of the left spanset
&& (and/sibling) — Both spansets must exist in the same trace
|| (or/union) — Either spanset matches
! (not) — The spanset must NOT exist in the trace

Intrinsic Attributes

TraceQL provides built-in intrinsic attributes that exist on every span without requiring explicit instrumentation:

# duration — span execution time
{ duration > 5s }

# status — span status (ok, error, unset)
{ status = error }

# kind — span kind (server, client, producer, consumer, internal)
{ kind = client }

# name — span operation name
{ name = "HTTP GET /api/users" }

# rootName — name of the root span in the trace
{ rootName = "POST /checkout" }

# rootServiceName — service name of the root span
{ rootServiceName = "api-gateway" }

# traceDuration — total duration of the entire trace
{ traceDuration > 10s }

# nestedSetLeft / nestedSetRight — positional attributes for advanced structural queries
{ nestedSetLeft > 0 }

# Combine intrinsic with span attributes
{ kind = server && duration > 1s && resource.service.name = "order-service" }

Aggregate Functions

TraceQL supports aggregate functions that compute values across all matching spans within each trace:

# count — number of matching spans per trace
{ resource.service.name = "api-gateway" } | count() > 10

# avg — average duration of matching spans
{ span.db.system = "redis" } | avg(duration) > 50ms

# max — maximum duration of matching spans
{ resource.service.name = "checkout" } | max(duration) > 3s

# min — minimum duration
{ status = error } | min(duration) < 1ms

# sum — total time spent in matching spans
{ span.db.system = "postgresql" } | sum(duration) > 5s

# Combine with selectors — find traces where the checkout service
# makes more than 20 database calls
{ resource.service.name = "checkout" && span.db.system != nil } | count() > 20

TraceQL Metrics Queries

TraceQL metrics queries compute time-series metrics from trace data, enabling dashboards and alerts powered by spans rather than application-level instrumentation:

# Rate of traces matching a pattern (traces per second)
{ resource.service.name = "api-gateway" && status = error } | rate()

# Histogram of durations for matching spans
{ resource.service.name = "checkout" && kind = server } | histogram_over_time(duration)

# Quantile calculation from trace data
{ resource.service.name = "payment" } | quantile_over_time(duration, 0.95)

# Count distinct traces per time interval
{ span.http.status_code >= 500 } | count_over_time()

# Compare metrics between services
{ resource.service.name = "v2-checkout" } | quantile_over_time(duration, 0.99)

                            
                            Performance Note: TraceQL metrics queries scan trace data on every evaluation. For high-frequency dashboards or alerts, prefer span metrics (pre-computed by the metrics-generator) over TraceQL metrics queries. Reserve TraceQL metrics for ad-hoc investigation and low-frequency alerting.
                        

Pivoting Between Data Types

The true power of Tempo within the Grafana ecosystem is its ability to serve as a correlation hub — connecting metrics, logs, and profiles through trace context. This section describes the concrete mechanisms for navigating between signals.

Cross-Signal Correlation Flow

flowchart LR
    M[Metrics
Mimir/Prometheus] -->|Exemplars| T[Traces
Tempo]
    L[Logs
Loki] -->|trace_id field| T
    T -->|trace-to-logs| L
    T -->|trace-to-metrics| M
    T -->|trace-to-profiles| P[Profiles
Pyroscope]
    P -->|span_id label| T

Metrics → Traces (via Exemplars)

Exemplars are sample trace IDs attached to metric data points. When a service records a histogram observation (e.g., request latency), it can attach the current trace_id as an exemplar. In Grafana, these appear as clickable dots on metric panels:

// Go — recording a histogram with an exemplar
import (
    "github.com/prometheus/client_golang/prometheus"
    "go.opentelemetry.io/otel/trace"
)

func recordLatency(ctx context.Context, duration float64) {
    span := trace.SpanFromContext(ctx)
    // Attach trace_id and span_id as exemplar labels
    exemplarLabels := prometheus.Labels{
        "trace_id": span.SpanContext().TraceID().String(),
        "span_id":  span.SpanContext().SpanID().String(),
    }
    httpRequestDuration.With(prometheus.Labels{
        "method": "GET",
        "path":   "/api/orders",
    }).
    (prometheus.ExemplarObserver).ObserveWithExemplar(
        duration, exemplarLabels,
    )
}

When investigating a latency spike in a Grafana dashboard, click the exemplar dot at the peak to jump directly to the trace that caused it — no manual trace searching required.

Logs → Traces (via trace_id)

When logs include a trace_id field (either as a structured field or extracted via LogQL), Grafana renders it as a clickable link that opens the corresponding trace in Tempo:

# Python — structured logging with trace context
import logging
import json_log_formatter
from opentelemetry import trace

formatter = json_log_formatter.JSONFormatter()
handler = logging.StreamHandler()
handler.setFormatter(formatter)
logger = logging.getLogger("order-service")
logger.addHandler(handler)
logger.setLevel(logging.INFO)

def process_order(order_id: str):
    span = trace.get_current_span()
    ctx = span.get_span_context()

    logger.info(
        "Processing order",
        extra={
            "order_id": order_id,
            "trace_id": format(ctx.trace_id, "032x"),
            "span_id": format(ctx.span_id, "016x"),
            "service": "order-service",
        }
    )

In Grafana’s Explore view, configure derived fields on the Loki datasource to automatically detect trace_id patterns and render them as links to Tempo.

Traces → Logs

From any span in a trace view, click “Logs for this span” to query Loki for all log lines emitted during that span’s execution window, filtered by trace_id. This is configured via the tracesToLogsV2 datasource setting shown earlier.

The query template typically looks like:

# LogQL query generated by trace-to-logs link
{service_name="checkout-service"} | json | trace_id="abc123def456" | line_format "{{.message}}"

Traces → Profiles

When a slow span is identified, the trace-to-profiles link opens Pyroscope filtered to the exact time window and service, showing CPU flame graphs or memory allocation profiles for the code executing during that span. This is particularly powerful for diagnosing:

Unexpectedly slow database queries (Is it the query itself or GC pressure?)
High-latency HTTP calls (Is the service CPU-bound or waiting on I/O?)
Memory-intensive operations (Which allocations dominate during this request?)

Tracing Protocols

Tempo accepts traces in multiple wire formats, making it compatible with any existing tracing instrumentation. Understanding these protocols helps you choose the right one for your environment and plan migrations.

OpenTelemetry (OTLP)

OTLP (OpenTelemetry Protocol) is the recommended protocol for all new instrumentation. It supports traces, metrics, and logs in a single protocol with both gRPC and HTTP/protobuf transports:

# Tempo configuration — OTLP receivers
distributor:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: "0.0.0.0:4317"
        http:
          endpoint: "0.0.0.0:4318"

# Python — sending traces via OTLP
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource

# Configure the tracer provider
resource = Resource.create({
    "service.name": "order-service",
    "service.version": "2.1.0",
    "deployment.environment": "production",
})

provider = TracerProvider(resource=resource)
exporter = OTLPSpanExporter(endpoint="http://tempo:4317", insecure=True)
provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(provider)

# Create spans
tracer = trace.get_tracer("order-service")

with tracer.start_as_current_span("process-order") as span:
    span.set_attribute("order.id", "ORD-12345")
    span.set_attribute("order.total", 99.99)
    span.set_attribute("customer.tier", "premium")
    # Child span
    with tracer.start_as_current_span("validate-inventory") as child:
        child.set_attribute("db.system", "postgresql")
        child.set_attribute("db.statement", "SELECT stock FROM products WHERE id = $1")

Jaeger & Zipkin

For teams migrating from existing Jaeger or Zipkin deployments, Tempo provides native receivers that accept their wire formats without any client-side changes:

# Tempo configuration — legacy protocol receivers
distributor:
  receivers:
    jaeger:
      protocols:
        thrift_http:
          endpoint: "0.0.0.0:14268"
        thrift_compact:
          endpoint: "0.0.0.0:6831"
        thrift_binary:
          endpoint: "0.0.0.0:6832"
        grpc:
          endpoint: "0.0.0.0:14250"
    zipkin:
      endpoint: "0.0.0.0:9411"

This means migration to Tempo is seamless: point your existing Jaeger agents or Zipkin reporters at Tempo’s receiver endpoints with zero client-side code changes. Over time, migrate instrumentation to OpenTelemetry SDKs for access to the latest features.

Propagation Formats

Propagation formats define how trace context (trace_id, span_id, sampling decision) is transmitted between services. The two dominant standards are:

Comparison W3C Trace Context vs B3 Propagation

W3C Trace Context (Recommended)

Header: traceparent: 00-{trace_id}-{span_id}-{flags}
Optional: tracestate: vendor1=value1,vendor2=value2
Standard: W3C Recommendation (2020), universally supported
Trace ID: 32 hex chars (128-bit), Span ID: 16 hex chars (64-bit)

B3 Propagation (Legacy/Zipkin)

Headers: X-B3-TraceId, X-B3-SpanId, X-B3-ParentSpanId, X-B3-Sampled
Single-header: b3: {trace_id}-{span_id}-{sampling}-{parent_span_id}
Standard: OpenZipkin specification, widely deployed in legacy systems
Supports both 64-bit and 128-bit trace IDs

W3C for New Systems B3 for Zipkin Compat Multi-Propagator Support

# W3C Trace Context header example
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
#            ver-trace_id(32hex)------------------span_id(16hex)---flags

# B3 single-header example
b3: 4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-1

# B3 multi-header example
X-B3-TraceId: 4bf92f3577b34da6a3ce929d0e0e4736
X-B3-SpanId: 00f067aa0ba902b7
X-B3-Sampled: 1

Context Propagation

Context propagation is the mechanism that connects spans across service boundaries into a single distributed trace. Without proper propagation, each service creates isolated traces that cannot be correlated. Understanding propagation is essential for ensuring trace continuity across your entire architecture.

HTTP Propagation

For HTTP-based services, trace context flows through request headers. The OpenTelemetry SDK automatically injects and extracts context from HTTP headers when properly configured:

# Python — automatic HTTP context propagation with OpenTelemetry
from opentelemetry import trace
from opentelemetry.propagate import set_global_textmap
from opentelemetry.propagators.composite import CompositePropagator
from opentelemetry.propagators.b3 import B3MultiFormat
from opentelemetry.trace.propagation import TraceContextTextMapPropagator
import requests

# Support both W3C and B3 for mixed environments
set_global_textmap(CompositePropagator([
    TraceContextTextMapPropagator(),  # W3C traceparent
    B3MultiFormat(),                   # B3 for legacy services
]))

# Instrumented HTTP client — context auto-injected into headers
from opentelemetry.instrumentation.requests import RequestsInstrumentor
RequestsInstrumentor().instrument()

tracer = trace.get_tracer("order-service")

def call_payment_service(order_id: str, amount: float):
    with tracer.start_as_current_span("call-payment") as span:
        span.set_attribute("payment.order_id", order_id)
        span.set_attribute("payment.amount", amount)
        # Headers automatically include traceparent and b3
        response = requests.post(
            "http://payment-service/api/charge",
            json={"order_id": order_id, "amount": amount}
        )
        span.set_attribute("http.status_code", response.status_code)
        return response

gRPC Propagation

For gRPC services, trace context propagates through gRPC metadata (the gRPC equivalent of HTTP headers). OpenTelemetry provides interceptors that handle this automatically:

// Go — gRPC server with automatic trace context extraction
package main

import (
    "go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc"
    "google.golang.org/grpc"
)

func main() {
    // Server — automatically extracts trace context from incoming metadata
    server := grpc.NewServer(
        grpc.StatsHandler(otelgrpc.NewServerHandler()),
    )

    // Client — automatically injects trace context into outgoing metadata
    conn, _ := grpc.Dial("inventory-service:50051",
        grpc.WithStatsHandler(otelgrpc.NewClientHandler()),
        grpc.WithInsecure(),
    )
}

Messaging Systems

Asynchronous messaging (Kafka, RabbitMQ, SQS) presents unique challenges for context propagation because the producer and consumer execute in different processes at different times. Trace context is embedded in message headers/attributes:

# Python — Kafka context propagation
from opentelemetry import trace, context
from opentelemetry.propagate import inject, extract
from confluent_kafka import Producer, Consumer

tracer = trace.get_tracer("order-service")

def produce_order_event(order_id: str, payload: dict):
    with tracer.start_as_current_span("produce-order-event", kind=trace.SpanKind.PRODUCER) as span:
        span.set_attribute("messaging.system", "kafka")
        span.set_attribute("messaging.destination", "orders")
        span.set_attribute("order.id", order_id)

        # Inject trace context into Kafka headers
        headers = {}
        inject(headers)
        kafka_headers = [(k, v.encode()) for k, v in headers.items()]

        producer.produce(
            topic="orders",
            key=order_id.encode(),
            value=json.dumps(payload).encode(),
            headers=kafka_headers,
        )

def consume_order_event(message):
    # Extract trace context from Kafka headers
    headers = {k: v.decode() for k, v in message.headers() or []}
    ctx = extract(headers)

    # Create a CONSUMER span linked to the producer's context
    with tracer.start_as_current_span(
        "consume-order-event",
        context=ctx,
        kind=trace.SpanKind.CONSUMER,
    ) as span:
        span.set_attribute("messaging.system", "kafka")
        span.set_attribute("messaging.destination", "orders")
        process_order(message.value())

                            
                            Messaging Trace Links: For messaging systems where the consumer processes messages in batches or with significant delay, consider using span links instead of parent-child relationships. Links indicate a causal relationship without implying the producer span was still active when the consumer ran. This prevents artificially inflated trace durations.
                        

Tempo Architecture

Tempo follows a microservices-based architecture similar to Loki and Mimir, designed for horizontal scalability and operational simplicity. Each component has a specific role in the trace lifecycle from ingestion to query.

Grafana Tempo Architecture

flowchart TD
    subgraph Ingestion
        A[SDK / Collector] -->|OTLP/Jaeger/Zipkin| D[Distributor]
        D -->|Hash ring| I[Ingester 1]
        D -->|Hash ring| I2[Ingester 2]
        D -->|Hash ring| I3[Ingester N]
    end

    subgraph Storage
        I -->|Flush blocks| OS[(Object Storage
S3/GCS/Azure)]
        I2 -->|Flush blocks| OS
        I3 -->|Flush blocks| OS
        C[Compactor] -->|Merge & deduplicate| OS
    end

    subgraph Query Path
        QF[Query Frontend] -->|Split & shard| Q[Querier 1]
        QF -->|Split & shard| Q2[Querier N]
        Q -->|Read blocks| OS
        Q -->|Read WAL| I
        Q2 -->|Read blocks| OS
    end

    subgraph Metrics
        D -->|Forward spans| MG[Metrics Generator]
        MG -->|Remote write| Mimir[(Mimir)]
    end

Distributors

Distributors are the entry point for all trace data. They receive spans from instrumented applications (via OTLP, Jaeger, or Zipkin protocols) and route them to the appropriate ingesters using a consistent hash ring based on trace_id:

Protocol translation — Converts all incoming formats to Tempo’s internal representation
Validation — Rejects malformed spans, enforces per-tenant limits (max spans/second, max trace size)
Hash routing — Ensures all spans for a given trace_id land on the same ingester for efficient batching
Replication — Optionally writes to multiple ingesters for durability (configurable replication factor)
Metrics forwarding — Sends a copy of spans to the metrics-generator for service graphs and span metrics

Ingesters

Ingesters batch incoming spans into traces and write them to local disk as a Write-Ahead Log (WAL) before flushing complete blocks to object storage:

Trace assembly — Collects spans belonging to the same trace_id into a single trace object
WAL persistence — Writes spans to local disk immediately for crash recovery
Block creation — When a trace is complete (no new spans for max_block_duration), it’s written into a Parquet block
Block flushing — Complete blocks are uploaded to object storage and the WAL is truncated
Live queries — Queriers can read recent/in-progress traces directly from ingester memory

# Ingester configuration
ingester:
  max_block_duration: 5m        # Maximum time to hold a block before flush
  max_block_bytes: 524288000    # 500MB max block size
  flush_check_period: 10s       # How often to check for flushable blocks
  trace_idle_period: 30s        # Mark trace complete after 30s of no new spans
  lifecycler:
    ring:
      replication_factor: 3     # Write to 3 ingesters for durability

Compactor

The compactor runs as a background process that optimizes stored blocks in object storage:

Block merging — Combines many small blocks into fewer large blocks for query efficiency
Deduplication — Removes duplicate spans created by replication
Retention enforcement — Deletes blocks older than the configured retention period
Bloom filter creation — Builds optional bloom filters for accelerated trace ID lookups
Dedicated columns — Promotes frequently-queried attributes to dedicated Parquet columns

# Compactor configuration
compactor:
  compaction:
    block_retention: 720h       # 30-day retention
    compacted_block_retention: 1h
    compaction_window: 4h       # Group blocks within 4-hour windows
    max_block_bytes: 107374182400  # 100GB max compacted block
    v2_out_buffer_bytes: 5242880
  ring:
    kvstore:
      store: memberlist

Queriers & Query-Frontend

The query path uses a two-tier architecture for efficient trace retrieval and TraceQL evaluation:

Query-Frontend:

Receives TraceQL queries from Grafana
Splits time-range queries into smaller sub-queries (sharding by time window)
Distributes shards across multiple querier instances
Deduplicates and merges results from all queriers
Implements query caching and request queuing

Queriers:

Execute individual query shards against object storage blocks
Read Parquet column data selectively (only columns referenced by the query)
Also query ingesters for recent traces not yet flushed to object storage
Return matching spansets to the query-frontend for aggregation

Metrics Generator

The metrics-generator is an optional component that derives time-series metrics from trace data in real-time:

Service graphs — Generates traces_service_graph_request_total, traces_service_graph_request_failed_total, traces_service_graph_request_server_seconds_*
Span metrics — Generates RED metrics from spans with configurable dimensions
Remote write — Pushes generated metrics to Mimir/Prometheus via remote write protocol
Exemplar attachment — Attaches trace_id exemplars to generated metrics for bidirectional correlation

The metrics-generator enables a powerful pattern: instrument with traces only, and derive all metrics automatically. This eliminates dual instrumentation (maintaining both metric and trace code) while preserving full metric-based dashboarding and alerting capabilities.

Best Practices

Production tracing requires careful attention to sampling strategies, attribute conventions, and trace quality. These practices ensure your traces remain useful, cost-effective, and performant at scale.

Sampling Strategies

While Tempo’s architecture supports storing 100% of traces, high-volume systems may still benefit from sampling to reduce network bandwidth and ingestion costs. The key is choosing the right sampling strategy:

Strategy Comparison Choosing the Right Sampling Approach

Head-Based Sampling (Decision at Trace Start)

Probabilistic — Sample X% of traces randomly (e.g., 10%)
Rate-limiting — Sample N traces per second per service
Pros: Simple, low overhead, predictable costs
Cons: Misses rare errors (99% of errors discarded at 1% sampling)

Tail-Based Sampling (Decision After Trace Completes)

Error-based — Keep all traces containing error spans
Latency-based — Keep traces exceeding P99 duration
Policy-based — Combine multiple rules (always keep errors + slow + specific endpoints)
Pros: Captures all interesting traces, no missed errors
Cons: Requires buffering all spans temporarily, complex to operate

Recommended: Hybrid Approach

100% sampling for error traces and traces > P95 latency
10-25% probabilistic sampling for normal traces
100% for specific critical paths (checkout, payment, authentication)

Tail-Based for Quality Head-Based for Cost Hybrid for Production

# OpenTelemetry Collector — tail-based sampling configuration
processors:
  tail_sampling:
    decision_wait: 10s           # Wait 10s for all spans to arrive
    num_traces: 100000           # Buffer up to 100k traces
    policies:
      # Always keep errors
      - name: errors
        type: status_code
        status_code:
          status_codes: [ERROR]
      # Always keep slow traces
      - name: slow-traces
        type: latency
        latency:
          threshold_ms: 2000
      # Always keep specific critical paths
      - name: critical-paths
        type: string_attribute
        string_attribute:
          key: http.route
          values: ["/api/checkout", "/api/payment", "/api/auth"]
      # Sample 15% of remaining traces
      - name: probabilistic
        type: probabilistic
        probabilistic:
          sampling_percentage: 15

Span Attributes

Well-chosen span attributes make traces searchable and meaningful. Follow OpenTelemetry semantic conventions for consistency across services:

# Recommended span attributes by category

# HTTP spans
span.http.method: "POST"
span.http.url: "https://api.example.com/v2/orders"
span.http.route: "/api/v2/orders"        # Use route pattern, not URL with IDs
span.http.status_code: 201
span.http.request_content_length: 1234
span.http.response_content_length: 567

# Database spans
span.db.system: "postgresql"
span.db.name: "orders_db"
span.db.operation: "SELECT"
span.db.statement: "SELECT * FROM orders WHERE id = $1"  # Parameterized!

# Messaging spans
span.messaging.system: "kafka"
span.messaging.destination: "order-events"
span.messaging.operation: "publish"
span.messaging.message.payload_size_bytes: 2048

# Custom business attributes
span.order.id: "ORD-12345"
span.customer.tier: "premium"
span.payment.method: "credit_card"
span.feature_flag.variant: "experiment-b"

                            
                            Attribute Best Practices: (1) Always use http.route patterns, never URLs with path parameters (avoids cardinality explosion). (2) Parameterize database statements — never embed literal values. (3) Add business-relevant attributes (order_id, customer_tier) for investigation. (4) Keep attribute names consistent across all services using semantic conventions.
                        

Service Naming

The service.name resource attribute is the most important attribute in your tracing deployment. It appears in service graphs, span metrics, and all TraceQL queries. Follow these conventions:

Use lowercase with hyphens — order-service, not OrderService or order_service
Be specific — payment-processor not backend
Include component — order-service-worker vs order-service-api if they’re separate processes
Don’t include environment — Use deployment.environment attribute instead
Be stable — Don’t change service names without coordinating dashboards, alerts, and team knowledge

# Good service naming
resource.service.name: "checkout-api"
resource.service.version: "3.2.1"
resource.deployment.environment: "production"
resource.service.namespace: "commerce"

# Bad service naming
resource.service.name: "prod-checkout-api-v3"    # Environment + version in name
resource.service.name: "Service1"                 # Meaningless
resource.service.name: "backend"                  # Too generic

Trace Quality

High-quality traces accelerate incident response. Low-quality traces create noise and erode trust in the tracing system. Focus on these quality dimensions:

Completeness — Every service in the request path should create spans (no gaps in the trace)
Context continuity — Trace context must propagate across all boundaries (HTTP, gRPC, queues, batch jobs)
Meaningful names — Span names should describe the operation, not the implementation (process-payment not handleRequest)
Error recording — All exceptions and error conditions must be recorded on spans with status = error and exception.* attributes
Appropriate granularity — Create spans for logical operations (DB queries, HTTP calls, queue operations), not for every function call

                            
                            Common Anti-Pattern: Creating spans for every method invocation produces traces with thousands of spans per request. This overwhelms visualizations, increases storage costs 10-100x, and makes traces harder to understand. Limit spans to I/O boundaries and significant logical operations (typically 10–50 spans per trace).
                        

Summary & Next Steps

In this article, we’ve explored the full landscape of distributed tracing with Grafana Tempo:

Tempo’s no-index design — Object-storage-only architecture that stores 100% of traces at a fraction of traditional costs
Tempo features — Service graphs, span metrics, and bidirectional cross-signal correlation (traces ↔ metrics ↔ logs ↔ profiles)
TraceQL mastery — Span selectors, structural operators (>>, ~, &&), intrinsic attributes, aggregates, and metrics queries
Cross-signal pivoting — Concrete mechanisms for navigating between metrics, traces, logs, and profiles using exemplars and trace IDs
Tracing protocols — OTLP (recommended), Jaeger, Zipkin receivers; W3C Trace Context and B3 propagation formats
Context propagation — How trace context flows across HTTP, gRPC, and messaging systems
Architecture — Distributors, ingesters, compactor, queriers, query-frontend, and metrics-generator components
Best practices — Sampling strategies (head vs tail vs hybrid), span attributes, service naming, and trace quality dimensions

The key insight is that modern distributed tracing is not just about visualizing request flows — it’s about building a correlation fabric that connects all observability signals. Tempo’s deep integration with Mimir (metrics), Loki (logs), and Pyroscope (profiles) transforms isolated data sources into a unified investigation experience.

Next in the Grafana Track

In Part 7: Interrogating Infrastructure, we’ll move from application-level observability to infrastructure monitoring — collecting and analyzing metrics from Kubernetes clusters, cloud providers, network devices, and bare-metal servers using Grafana Alloy, Prometheus exporters, and cloud integrations.

Previous Part 5: Monitoring with Metrics — Mimir & PromQL Next Part 7: Interrogating Infrastructure