Introducing Tempo & TraceQL
Distributed tracing is the observability signal that reveals how a request flows through your system — which services it touches, where latency accumulates, and where errors originate. While metrics tell you what is happening and logs tell you why, traces tell you the complete journey of every request across service boundaries.
Grafana Tempo is an open-source, high-volume distributed tracing backend that stores and queries traces with minimal operational complexity. Combined with TraceQL — Tempo’s purpose-built query language — it provides a powerful system for understanding request latency, error propagation, and service dependencies in microservices architectures.
What Is Grafana Tempo?
Tempo is a horizontally-scalable distributed tracing backend designed for high-throughput trace ingestion and low-cost long-term storage. Unlike traditional tracing systems (Jaeger with Elasticsearch, Zipkin with Cassandra), Tempo requires only object storage (S3, GCS, Azure Blob) as its backend — no additional databases, no index nodes, no complex cluster management.
Tempo accepts traces in multiple formats — OpenTelemetry (OTLP), Jaeger, Zipkin, and OpenCensus — making it a drop-in backend regardless of your existing instrumentation. It integrates deeply with Grafana for visualization, providing flame graphs, service graphs, and seamless cross-signal navigation.
The No-Index Design
Traditional tracing backends index every span attribute to enable search. This creates enormous operational overhead: index storage often exceeds trace data itself, write amplification degrades performance, and teams are forced to sample aggressively (keeping only 1–5% of traces) to control costs.
Tempo takes a fundamentally different approach:
- No per-span indexing — Traces are stored as-is in columnar Parquet blocks in object storage
- Trace ID lookup — Finding a trace by ID is a direct object storage read (O(1) per block)
- TraceQL search — Columnar format enables efficient scanning without traditional indexes
- Bloom filters — Optional probabilistic data structures accelerate trace ID lookups across blocks
- Dedicated attribute columns — Frequently-queried attributes can be promoted to dedicated columns for faster search
The Parquet columnar format is crucial: when TraceQL queries filter on span.http.status_code, Tempo reads only the status_code column from each block — not the entire span. This makes full-text-search-style queries feasible over object storage without traditional indexes.
Cost Benefits at Scale
Traditional Backend (Jaeger + Elasticsearch)
- Elasticsearch cluster: 3 data nodes × 500GB SSD = $2,400/month
- Index overhead: ~60% of data volume goes to indexing
- Forced sampling at 5% to control costs — losing 95% of traces
- Operational burden: index management, shard rebalancing, version upgrades
Tempo with Object Storage
- Object storage (S3): ~2.5 TB compressed at $0.023/GB = $58/month
- No index overhead — pure trace data
- Store 100% of traces — no sampling required
- Operational burden: configure bucket lifecycle policies
The cost difference becomes even more dramatic at higher volumes. Organizations ingesting 100,000+ spans/second can save hundreds of thousands of dollars annually by switching from indexed tracing backends to Tempo’s object-storage-only architecture.
Exploring Tempo Features
Beyond basic trace storage and retrieval, Tempo provides several features that transform raw traces into actionable observability insights.
Trace Discovery
Finding interesting traces without indexes requires smart discovery mechanisms. Tempo provides multiple paths to discover traces:
- TraceQL search — Query spans by attributes, duration, status, and structural relationships
- Service graph — Visual topology map showing service-to-service call patterns and error rates
- Span metrics — Pre-computed RED metrics (Rate, Errors, Duration) generated from traces, queryable via PromQL
- Exemplars — Clickable trace links embedded in metric time-series panels
- Log correlation — Jump from a log line containing a trace_id directly to the full trace
The combination of these mechanisms means you rarely need to "search" for traces in the traditional sense. Instead, you navigate to traces from other signals — a latency spike in a metric panel leads to an exemplar, which opens the offending trace.
Service Graphs
Tempo’s metrics-generator component analyzes incoming spans to build a real-time topology of your service mesh. The service graph shows:
- Nodes — Each service, sized by request volume
- Edges — Service-to-service calls, colored by error rate
- Latency — P50/P95/P99 latency on each edge
- Request rate — Calls per second between services
Service graphs are stored as Prometheus metrics (using the traces_service_graph_* metric family), meaning you can alert on topology changes, build dashboards showing dependency health, and detect when new services appear or existing connections break.
# Tempo configuration — enable metrics generator for service graphs
metrics_generator:
processor:
service_graphs:
enabled: true
dimensions:
- http.method
- http.status_code
peer_attributes:
- db.system
- messaging.system
enable_client_server_prefix: true
# Wait time before considering a span complete
wait: 10s
# Maximum items in the store
max_items: 10000
storage:
path: /var/tempo/generator/wal
remote_write:
- url: http://mimir:9009/api/v1/push
Span Metrics
Span metrics transform trace data into time-series metrics — giving you the best of both worlds. For every span matching your configured dimensions, Tempo generates:
traces_spanmetrics_latency_bucket— Histogram of span durationstraces_spanmetrics_calls_total— Counter of span completions (rate = throughput)traces_spanmetrics_size_total— Counter of span sizes in bytes
# Span metrics configuration
metrics_generator:
processor:
span_metrics:
enabled: true
dimensions:
- service.name
- span.name
- http.method
- http.status_code
- db.system
# Enable exemplars for linking back to traces
enable_target_info: true
# Histogram buckets for latency
histogram_buckets: [0.002, 0.004, 0.008, 0.016, 0.032, 0.064, 0.128, 0.256, 0.512, 1.024, 2.048, 4.096, 8.192, 16.384]
This means you can build PromQL dashboards and alerts from trace data — detecting latency regressions, error rate spikes, and throughput changes — without separate application-level metrics instrumentation.
Cross-Signal Links
Tempo supports bidirectional navigation between all observability signals:
- Trace-to-Logs — From any span, jump to correlated logs in Loki filtered by trace_id and time range
- Trace-to-Metrics — From a span, see the corresponding service metrics at that exact timestamp
- Trace-to-Profiles — From a slow span, jump to a continuous profile in Pyroscope showing CPU/memory usage during that span’s execution
- Logs-to-Traces — From a log line containing a trace_id field, jump directly to the full trace
- Metrics-to-Traces — From a metric panel with exemplars enabled, click a data point to see the trace that produced it
# Grafana datasource configuration for cross-signal correlation
datasources:
- name: Tempo
type: tempo
url: http://tempo:3200
jsonData:
tracesToLogsV2:
datasourceUid: loki
spanStartTimeShift: "-1h"
spanEndTimeShift: "1h"
filterByTraceID: true
filterBySpanID: false
customQuery: true
query: '{$${__tags}} | trace_id="$${__trace.traceId}"'
tracesToMetrics:
datasourceUid: mimir
spanStartTimeShift: "-5m"
spanEndTimeShift: "5m"
tags:
- key: service.name
value: service
queries:
- name: "Request Rate"
query: "sum(rate(http_server_request_duration_seconds_count{service=\"$${__tags.service.name}\"}[5m]))"
- name: "Error Rate"
query: "sum(rate(http_server_request_duration_seconds_count{service=\"$${__tags.service.name}\",http_status_code=~\"5..\"}[5m]))"
tracesToProfiles:
datasourceUid: pyroscope
profileTypeId: "process_cpu:cpu:nanoseconds:cpu:nanoseconds"
customQuery: true
query: '{service_name="$${__tags.service.name}"}'
serviceMap:
datasourceUid: mimir
nodeGraph:
enabled: true
The TraceQL Query Language
TraceQL is a purpose-built query language for searching and analyzing distributed traces. Unlike traditional trace search (filter by service name, operation, tags), TraceQL enables structural queries — finding traces based on relationships between spans, not just individual span attributes.
Span Selectors & Filters
The basic building block of TraceQL is the spanset selector — a set of conditions that match spans within traces:
# Basic span selector — find spans from a specific service
{ resource.service.name = "api-gateway" }
# Attribute filter — spans with specific HTTP method
{ span.http.method = "POST" }
# Status filter — find error spans
{ status = error }
# Duration filter — find slow spans (> 500ms)
{ duration > 500ms }
# Combine multiple conditions (AND within a spanset)
{ resource.service.name = "checkout-service" && span.http.status_code >= 500 && duration > 1s }
# String matching with regex
{ span.http.url =~ "/api/v[12]/users.*" }
# Not-equal operator
{ resource.service.name != "health-check" }
# Exists check — span has this attribute
{ span.db.system != nil }
# Numeric range
{ span.http.response_content_length > 1000000 }
resource.* attributes (service-level, shared across all spans from a service), span.* attributes (individual span metadata), and unscoped attributes (searched in both). Always scope attributes explicitly for best performance — resource.service.name is faster than unscoped service.name.
Structural Operators
TraceQL’s most powerful feature is structural operators — querying traces based on the relationships between spans. This is what sets TraceQL apart from simple tag-based search:
# Parent-child (>>) — find traces where api-gateway directly calls a slow DB query
{ resource.service.name = "api-gateway" } >> { span.db.system = "postgresql" && duration > 200ms }
# Descendant (~) — find traces where api-gateway eventually reaches a slow DB query
# (even through intermediate services)
{ resource.service.name = "api-gateway" } ~ { span.db.system = "postgresql" && duration > 200ms }
# Sibling (&&) — find traces containing BOTH patterns (spans in same trace, not necessarily related)
{ resource.service.name = "payment-service" && status = error } && { resource.service.name = "inventory-service" && status = error }
# Negation (!) — find traces where gateway calls exist but NO database spans are present
{ resource.service.name = "api-gateway" } && !{ span.db.system != nil }
# Combining structural and duration — find traces where a checkout service
# call descends into a slow external HTTP call
{ resource.service.name = "checkout" && span.name = "POST /order" } ~
{ span.http.url =~ "https://external-payment.*" && duration > 2s }
The structural operators enable queries that would be impossible with flat tag-based search:
>>(child) — The right spanset must be a direct child of the left spanset~(descendant) — The right spanset must be a descendant (any depth) of the left spanset&&(and/sibling) — Both spansets must exist in the same trace||(or/union) — Either spanset matches!(not) — The spanset must NOT exist in the trace
Intrinsic Attributes
TraceQL provides built-in intrinsic attributes that exist on every span without requiring explicit instrumentation:
# duration — span execution time
{ duration > 5s }
# status — span status (ok, error, unset)
{ status = error }
# kind — span kind (server, client, producer, consumer, internal)
{ kind = client }
# name — span operation name
{ name = "HTTP GET /api/users" }
# rootName — name of the root span in the trace
{ rootName = "POST /checkout" }
# rootServiceName — service name of the root span
{ rootServiceName = "api-gateway" }
# traceDuration — total duration of the entire trace
{ traceDuration > 10s }
# nestedSetLeft / nestedSetRight — positional attributes for advanced structural queries
{ nestedSetLeft > 0 }
# Combine intrinsic with span attributes
{ kind = server && duration > 1s && resource.service.name = "order-service" }
Aggregate Functions
TraceQL supports aggregate functions that compute values across all matching spans within each trace:
# count — number of matching spans per trace
{ resource.service.name = "api-gateway" } | count() > 10
# avg — average duration of matching spans
{ span.db.system = "redis" } | avg(duration) > 50ms
# max — maximum duration of matching spans
{ resource.service.name = "checkout" } | max(duration) > 3s
# min — minimum duration
{ status = error } | min(duration) < 1ms
# sum — total time spent in matching spans
{ span.db.system = "postgresql" } | sum(duration) > 5s
# Combine with selectors — find traces where the checkout service
# makes more than 20 database calls
{ resource.service.name = "checkout" && span.db.system != nil } | count() > 20
TraceQL Metrics Queries
TraceQL metrics queries compute time-series metrics from trace data, enabling dashboards and alerts powered by spans rather than application-level instrumentation:
# Rate of traces matching a pattern (traces per second)
{ resource.service.name = "api-gateway" && status = error } | rate()
# Histogram of durations for matching spans
{ resource.service.name = "checkout" && kind = server } | histogram_over_time(duration)
# Quantile calculation from trace data
{ resource.service.name = "payment" } | quantile_over_time(duration, 0.95)
# Count distinct traces per time interval
{ span.http.status_code >= 500 } | count_over_time()
# Compare metrics between services
{ resource.service.name = "v2-checkout" } | quantile_over_time(duration, 0.99)
Pivoting Between Data Types
The true power of Tempo within the Grafana ecosystem is its ability to serve as a correlation hub — connecting metrics, logs, and profiles through trace context. This section describes the concrete mechanisms for navigating between signals.
flowchart LR
M[Metrics
Mimir/Prometheus] -->|Exemplars| T[Traces
Tempo]
L[Logs
Loki] -->|trace_id field| T
T -->|trace-to-logs| L
T -->|trace-to-metrics| M
T -->|trace-to-profiles| P[Profiles
Pyroscope]
P -->|span_id label| T
Metrics → Traces (via Exemplars)
Exemplars are sample trace IDs attached to metric data points. When a service records a histogram observation (e.g., request latency), it can attach the current trace_id as an exemplar. In Grafana, these appear as clickable dots on metric panels:
// Go — recording a histogram with an exemplar
import (
"github.com/prometheus/client_golang/prometheus"
"go.opentelemetry.io/otel/trace"
)
func recordLatency(ctx context.Context, duration float64) {
span := trace.SpanFromContext(ctx)
// Attach trace_id and span_id as exemplar labels
exemplarLabels := prometheus.Labels{
"trace_id": span.SpanContext().TraceID().String(),
"span_id": span.SpanContext().SpanID().String(),
}
httpRequestDuration.With(prometheus.Labels{
"method": "GET",
"path": "/api/orders",
}).
(prometheus.ExemplarObserver).ObserveWithExemplar(
duration, exemplarLabels,
)
}
When investigating a latency spike in a Grafana dashboard, click the exemplar dot at the peak to jump directly to the trace that caused it — no manual trace searching required.
Logs → Traces (via trace_id)
When logs include a trace_id field (either as a structured field or extracted via LogQL), Grafana renders it as a clickable link that opens the corresponding trace in Tempo:
# Python — structured logging with trace context
import logging
import json_log_formatter
from opentelemetry import trace
formatter = json_log_formatter.JSONFormatter()
handler = logging.StreamHandler()
handler.setFormatter(formatter)
logger = logging.getLogger("order-service")
logger.addHandler(handler)
logger.setLevel(logging.INFO)
def process_order(order_id: str):
span = trace.get_current_span()
ctx = span.get_span_context()
logger.info(
"Processing order",
extra={
"order_id": order_id,
"trace_id": format(ctx.trace_id, "032x"),
"span_id": format(ctx.span_id, "016x"),
"service": "order-service",
}
)
In Grafana’s Explore view, configure derived fields on the Loki datasource to automatically detect trace_id patterns and render them as links to Tempo.
Traces → Logs
From any span in a trace view, click “Logs for this span” to query Loki for all log lines emitted during that span’s execution window, filtered by trace_id. This is configured via the tracesToLogsV2 datasource setting shown earlier.
The query template typically looks like:
# LogQL query generated by trace-to-logs link
{service_name="checkout-service"} | json | trace_id="abc123def456" | line_format "{{.message}}"
Traces → Profiles
When a slow span is identified, the trace-to-profiles link opens Pyroscope filtered to the exact time window and service, showing CPU flame graphs or memory allocation profiles for the code executing during that span. This is particularly powerful for diagnosing:
- Unexpectedly slow database queries (Is it the query itself or GC pressure?)
- High-latency HTTP calls (Is the service CPU-bound or waiting on I/O?)
- Memory-intensive operations (Which allocations dominate during this request?)
Tracing Protocols
Tempo accepts traces in multiple wire formats, making it compatible with any existing tracing instrumentation. Understanding these protocols helps you choose the right one for your environment and plan migrations.
OpenTelemetry (OTLP)
OTLP (OpenTelemetry Protocol) is the recommended protocol for all new instrumentation. It supports traces, metrics, and logs in a single protocol with both gRPC and HTTP/protobuf transports:
# Tempo configuration — OTLP receivers
distributor:
receivers:
otlp:
protocols:
grpc:
endpoint: "0.0.0.0:4317"
http:
endpoint: "0.0.0.0:4318"
# Python — sending traces via OTLP
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
# Configure the tracer provider
resource = Resource.create({
"service.name": "order-service",
"service.version": "2.1.0",
"deployment.environment": "production",
})
provider = TracerProvider(resource=resource)
exporter = OTLPSpanExporter(endpoint="http://tempo:4317", insecure=True)
provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(provider)
# Create spans
tracer = trace.get_tracer("order-service")
with tracer.start_as_current_span("process-order") as span:
span.set_attribute("order.id", "ORD-12345")
span.set_attribute("order.total", 99.99)
span.set_attribute("customer.tier", "premium")
# Child span
with tracer.start_as_current_span("validate-inventory") as child:
child.set_attribute("db.system", "postgresql")
child.set_attribute("db.statement", "SELECT stock FROM products WHERE id = $1")
Jaeger & Zipkin
For teams migrating from existing Jaeger or Zipkin deployments, Tempo provides native receivers that accept their wire formats without any client-side changes:
# Tempo configuration — legacy protocol receivers
distributor:
receivers:
jaeger:
protocols:
thrift_http:
endpoint: "0.0.0.0:14268"
thrift_compact:
endpoint: "0.0.0.0:6831"
thrift_binary:
endpoint: "0.0.0.0:6832"
grpc:
endpoint: "0.0.0.0:14250"
zipkin:
endpoint: "0.0.0.0:9411"
This means migration to Tempo is seamless: point your existing Jaeger agents or Zipkin reporters at Tempo’s receiver endpoints with zero client-side code changes. Over time, migrate instrumentation to OpenTelemetry SDKs for access to the latest features.
Propagation Formats
Propagation formats define how trace context (trace_id, span_id, sampling decision) is transmitted between services. The two dominant standards are:
W3C Trace Context (Recommended)
- Header:
traceparent: 00-{trace_id}-{span_id}-{flags} - Optional:
tracestate: vendor1=value1,vendor2=value2 - Standard: W3C Recommendation (2020), universally supported
- Trace ID: 32 hex chars (128-bit), Span ID: 16 hex chars (64-bit)
B3 Propagation (Legacy/Zipkin)
- Headers:
X-B3-TraceId,X-B3-SpanId,X-B3-ParentSpanId,X-B3-Sampled - Single-header:
b3: {trace_id}-{span_id}-{sampling}-{parent_span_id} - Standard: OpenZipkin specification, widely deployed in legacy systems
- Supports both 64-bit and 128-bit trace IDs
# W3C Trace Context header example
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
# ver-trace_id(32hex)------------------span_id(16hex)---flags
# B3 single-header example
b3: 4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-1
# B3 multi-header example
X-B3-TraceId: 4bf92f3577b34da6a3ce929d0e0e4736
X-B3-SpanId: 00f067aa0ba902b7
X-B3-Sampled: 1
Context Propagation
Context propagation is the mechanism that connects spans across service boundaries into a single distributed trace. Without proper propagation, each service creates isolated traces that cannot be correlated. Understanding propagation is essential for ensuring trace continuity across your entire architecture.
HTTP Propagation
For HTTP-based services, trace context flows through request headers. The OpenTelemetry SDK automatically injects and extracts context from HTTP headers when properly configured:
# Python — automatic HTTP context propagation with OpenTelemetry
from opentelemetry import trace
from opentelemetry.propagate import set_global_textmap
from opentelemetry.propagators.composite import CompositePropagator
from opentelemetry.propagators.b3 import B3MultiFormat
from opentelemetry.trace.propagation import TraceContextTextMapPropagator
import requests
# Support both W3C and B3 for mixed environments
set_global_textmap(CompositePropagator([
TraceContextTextMapPropagator(), # W3C traceparent
B3MultiFormat(), # B3 for legacy services
]))
# Instrumented HTTP client — context auto-injected into headers
from opentelemetry.instrumentation.requests import RequestsInstrumentor
RequestsInstrumentor().instrument()
tracer = trace.get_tracer("order-service")
def call_payment_service(order_id: str, amount: float):
with tracer.start_as_current_span("call-payment") as span:
span.set_attribute("payment.order_id", order_id)
span.set_attribute("payment.amount", amount)
# Headers automatically include traceparent and b3
response = requests.post(
"http://payment-service/api/charge",
json={"order_id": order_id, "amount": amount}
)
span.set_attribute("http.status_code", response.status_code)
return response
gRPC Propagation
For gRPC services, trace context propagates through gRPC metadata (the gRPC equivalent of HTTP headers). OpenTelemetry provides interceptors that handle this automatically:
// Go — gRPC server with automatic trace context extraction
package main
import (
"go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc"
"google.golang.org/grpc"
)
func main() {
// Server — automatically extracts trace context from incoming metadata
server := grpc.NewServer(
grpc.StatsHandler(otelgrpc.NewServerHandler()),
)
// Client — automatically injects trace context into outgoing metadata
conn, _ := grpc.Dial("inventory-service:50051",
grpc.WithStatsHandler(otelgrpc.NewClientHandler()),
grpc.WithInsecure(),
)
}
Messaging Systems
Asynchronous messaging (Kafka, RabbitMQ, SQS) presents unique challenges for context propagation because the producer and consumer execute in different processes at different times. Trace context is embedded in message headers/attributes:
# Python — Kafka context propagation
from opentelemetry import trace, context
from opentelemetry.propagate import inject, extract
from confluent_kafka import Producer, Consumer
tracer = trace.get_tracer("order-service")
def produce_order_event(order_id: str, payload: dict):
with tracer.start_as_current_span("produce-order-event", kind=trace.SpanKind.PRODUCER) as span:
span.set_attribute("messaging.system", "kafka")
span.set_attribute("messaging.destination", "orders")
span.set_attribute("order.id", order_id)
# Inject trace context into Kafka headers
headers = {}
inject(headers)
kafka_headers = [(k, v.encode()) for k, v in headers.items()]
producer.produce(
topic="orders",
key=order_id.encode(),
value=json.dumps(payload).encode(),
headers=kafka_headers,
)
def consume_order_event(message):
# Extract trace context from Kafka headers
headers = {k: v.decode() for k, v in message.headers() or []}
ctx = extract(headers)
# Create a CONSUMER span linked to the producer's context
with tracer.start_as_current_span(
"consume-order-event",
context=ctx,
kind=trace.SpanKind.CONSUMER,
) as span:
span.set_attribute("messaging.system", "kafka")
span.set_attribute("messaging.destination", "orders")
process_order(message.value())
Tempo Architecture
Tempo follows a microservices-based architecture similar to Loki and Mimir, designed for horizontal scalability and operational simplicity. Each component has a specific role in the trace lifecycle from ingestion to query.
flowchart TD
subgraph Ingestion
A[SDK / Collector] -->|OTLP/Jaeger/Zipkin| D[Distributor]
D -->|Hash ring| I[Ingester 1]
D -->|Hash ring| I2[Ingester 2]
D -->|Hash ring| I3[Ingester N]
end
subgraph Storage
I -->|Flush blocks| OS[(Object Storage
S3/GCS/Azure)]
I2 -->|Flush blocks| OS
I3 -->|Flush blocks| OS
C[Compactor] -->|Merge & deduplicate| OS
end
subgraph Query Path
QF[Query Frontend] -->|Split & shard| Q[Querier 1]
QF -->|Split & shard| Q2[Querier N]
Q -->|Read blocks| OS
Q -->|Read WAL| I
Q2 -->|Read blocks| OS
end
subgraph Metrics
D -->|Forward spans| MG[Metrics Generator]
MG -->|Remote write| Mimir[(Mimir)]
end
Distributors
Distributors are the entry point for all trace data. They receive spans from instrumented applications (via OTLP, Jaeger, or Zipkin protocols) and route them to the appropriate ingesters using a consistent hash ring based on trace_id:
- Protocol translation — Converts all incoming formats to Tempo’s internal representation
- Validation — Rejects malformed spans, enforces per-tenant limits (max spans/second, max trace size)
- Hash routing — Ensures all spans for a given trace_id land on the same ingester for efficient batching
- Replication — Optionally writes to multiple ingesters for durability (configurable replication factor)
- Metrics forwarding — Sends a copy of spans to the metrics-generator for service graphs and span metrics
Ingesters
Ingesters batch incoming spans into traces and write them to local disk as a Write-Ahead Log (WAL) before flushing complete blocks to object storage:
- Trace assembly — Collects spans belonging to the same trace_id into a single trace object
- WAL persistence — Writes spans to local disk immediately for crash recovery
- Block creation — When a trace is complete (no new spans for
max_block_duration), it’s written into a Parquet block - Block flushing — Complete blocks are uploaded to object storage and the WAL is truncated
- Live queries — Queriers can read recent/in-progress traces directly from ingester memory
# Ingester configuration
ingester:
max_block_duration: 5m # Maximum time to hold a block before flush
max_block_bytes: 524288000 # 500MB max block size
flush_check_period: 10s # How often to check for flushable blocks
trace_idle_period: 30s # Mark trace complete after 30s of no new spans
lifecycler:
ring:
replication_factor: 3 # Write to 3 ingesters for durability
Compactor
The compactor runs as a background process that optimizes stored blocks in object storage:
- Block merging — Combines many small blocks into fewer large blocks for query efficiency
- Deduplication — Removes duplicate spans created by replication
- Retention enforcement — Deletes blocks older than the configured retention period
- Bloom filter creation — Builds optional bloom filters for accelerated trace ID lookups
- Dedicated columns — Promotes frequently-queried attributes to dedicated Parquet columns
# Compactor configuration
compactor:
compaction:
block_retention: 720h # 30-day retention
compacted_block_retention: 1h
compaction_window: 4h # Group blocks within 4-hour windows
max_block_bytes: 107374182400 # 100GB max compacted block
v2_out_buffer_bytes: 5242880
ring:
kvstore:
store: memberlist
Queriers & Query-Frontend
The query path uses a two-tier architecture for efficient trace retrieval and TraceQL evaluation:
Query-Frontend:
- Receives TraceQL queries from Grafana
- Splits time-range queries into smaller sub-queries (sharding by time window)
- Distributes shards across multiple querier instances
- Deduplicates and merges results from all queriers
- Implements query caching and request queuing
Queriers:
- Execute individual query shards against object storage blocks
- Read Parquet column data selectively (only columns referenced by the query)
- Also query ingesters for recent traces not yet flushed to object storage
- Return matching spansets to the query-frontend for aggregation
Metrics Generator
The metrics-generator is an optional component that derives time-series metrics from trace data in real-time:
- Service graphs — Generates
traces_service_graph_request_total,traces_service_graph_request_failed_total,traces_service_graph_request_server_seconds_* - Span metrics — Generates RED metrics from spans with configurable dimensions
- Remote write — Pushes generated metrics to Mimir/Prometheus via remote write protocol
- Exemplar attachment — Attaches trace_id exemplars to generated metrics for bidirectional correlation
The metrics-generator enables a powerful pattern: instrument with traces only, and derive all metrics automatically. This eliminates dual instrumentation (maintaining both metric and trace code) while preserving full metric-based dashboarding and alerting capabilities.
Best Practices
Production tracing requires careful attention to sampling strategies, attribute conventions, and trace quality. These practices ensure your traces remain useful, cost-effective, and performant at scale.
Sampling Strategies
While Tempo’s architecture supports storing 100% of traces, high-volume systems may still benefit from sampling to reduce network bandwidth and ingestion costs. The key is choosing the right sampling strategy:
Head-Based Sampling (Decision at Trace Start)
- Probabilistic — Sample X% of traces randomly (e.g., 10%)
- Rate-limiting — Sample N traces per second per service
- Pros: Simple, low overhead, predictable costs
- Cons: Misses rare errors (99% of errors discarded at 1% sampling)
Tail-Based Sampling (Decision After Trace Completes)
- Error-based — Keep all traces containing error spans
- Latency-based — Keep traces exceeding P99 duration
- Policy-based — Combine multiple rules (always keep errors + slow + specific endpoints)
- Pros: Captures all interesting traces, no missed errors
- Cons: Requires buffering all spans temporarily, complex to operate
Recommended: Hybrid Approach
- 100% sampling for error traces and traces > P95 latency
- 10-25% probabilistic sampling for normal traces
- 100% for specific critical paths (checkout, payment, authentication)
# OpenTelemetry Collector — tail-based sampling configuration
processors:
tail_sampling:
decision_wait: 10s # Wait 10s for all spans to arrive
num_traces: 100000 # Buffer up to 100k traces
policies:
# Always keep errors
- name: errors
type: status_code
status_code:
status_codes: [ERROR]
# Always keep slow traces
- name: slow-traces
type: latency
latency:
threshold_ms: 2000
# Always keep specific critical paths
- name: critical-paths
type: string_attribute
string_attribute:
key: http.route
values: ["/api/checkout", "/api/payment", "/api/auth"]
# Sample 15% of remaining traces
- name: probabilistic
type: probabilistic
probabilistic:
sampling_percentage: 15
Span Attributes
Well-chosen span attributes make traces searchable and meaningful. Follow OpenTelemetry semantic conventions for consistency across services:
# Recommended span attributes by category
# HTTP spans
span.http.method: "POST"
span.http.url: "https://api.example.com/v2/orders"
span.http.route: "/api/v2/orders" # Use route pattern, not URL with IDs
span.http.status_code: 201
span.http.request_content_length: 1234
span.http.response_content_length: 567
# Database spans
span.db.system: "postgresql"
span.db.name: "orders_db"
span.db.operation: "SELECT"
span.db.statement: "SELECT * FROM orders WHERE id = $1" # Parameterized!
# Messaging spans
span.messaging.system: "kafka"
span.messaging.destination: "order-events"
span.messaging.operation: "publish"
span.messaging.message.payload_size_bytes: 2048
# Custom business attributes
span.order.id: "ORD-12345"
span.customer.tier: "premium"
span.payment.method: "credit_card"
span.feature_flag.variant: "experiment-b"
http.route patterns, never URLs with path parameters (avoids cardinality explosion). (2) Parameterize database statements — never embed literal values. (3) Add business-relevant attributes (order_id, customer_tier) for investigation. (4) Keep attribute names consistent across all services using semantic conventions.
Service Naming
The service.name resource attribute is the most important attribute in your tracing deployment. It appears in service graphs, span metrics, and all TraceQL queries. Follow these conventions:
- Use lowercase with hyphens —
order-service, notOrderServiceororder_service - Be specific —
payment-processornotbackend - Include component —
order-service-workervsorder-service-apiif they’re separate processes - Don’t include environment — Use
deployment.environmentattribute instead - Be stable — Don’t change service names without coordinating dashboards, alerts, and team knowledge
# Good service naming
resource.service.name: "checkout-api"
resource.service.version: "3.2.1"
resource.deployment.environment: "production"
resource.service.namespace: "commerce"
# Bad service naming
resource.service.name: "prod-checkout-api-v3" # Environment + version in name
resource.service.name: "Service1" # Meaningless
resource.service.name: "backend" # Too generic
Trace Quality
High-quality traces accelerate incident response. Low-quality traces create noise and erode trust in the tracing system. Focus on these quality dimensions:
- Completeness — Every service in the request path should create spans (no gaps in the trace)
- Context continuity — Trace context must propagate across all boundaries (HTTP, gRPC, queues, batch jobs)
- Meaningful names — Span names should describe the operation, not the implementation (
process-paymentnothandleRequest) - Error recording — All exceptions and error conditions must be recorded on spans with
status = errorandexception.*attributes - Appropriate granularity — Create spans for logical operations (DB queries, HTTP calls, queue operations), not for every function call
Summary & Next Steps
In this article, we’ve explored the full landscape of distributed tracing with Grafana Tempo:
- Tempo’s no-index design — Object-storage-only architecture that stores 100% of traces at a fraction of traditional costs
- Tempo features — Service graphs, span metrics, and bidirectional cross-signal correlation (traces ↔ metrics ↔ logs ↔ profiles)
- TraceQL mastery — Span selectors, structural operators (>>, ~, &&), intrinsic attributes, aggregates, and metrics queries
- Cross-signal pivoting — Concrete mechanisms for navigating between metrics, traces, logs, and profiles using exemplars and trace IDs
- Tracing protocols — OTLP (recommended), Jaeger, Zipkin receivers; W3C Trace Context and B3 propagation formats
- Context propagation — How trace context flows across HTTP, gRPC, and messaging systems
- Architecture — Distributors, ingesters, compactor, queriers, query-frontend, and metrics-generator components
- Best practices — Sampling strategies (head vs tail vs hybrid), span attributes, service naming, and trace quality dimensions
The key insight is that modern distributed tracing is not just about visualizing request flows — it’s about building a correlation fabric that connects all observability signals. Tempo’s deep integration with Mimir (metrics), Loki (logs), and Pyroscope (profiles) transforms isolated data sources into a unified investigation experience.
Next in the Grafana Track
In Part 7: Interrogating Infrastructure, we’ll move from application-level observability to infrastructure monitoring — collecting and analyzing metrics from Kubernetes clusters, cloud providers, network devices, and bare-metal servers using Grafana Alloy, Prometheus exporters, and cloud integrations.