Why OpenTelemetry?
The Fragmentation Problem
Before OpenTelemetry, if you wanted to instrument your application you had to choose between competing, incompatible solutions:
- Prometheus client libraries for metrics
- Jaeger client libraries for tracing (based on OpenTracing)
- Vendor-specific SDKs (Datadog agent, New Relic agent, Dynatrace OneAgent)
- OpenCensus (Google's instrumentation library)
- Zipkin libraries for Zipkin-native tracing
Each had its own API, its own data format, and its own export targets. Switching from Jaeger to Datadog meant ripping out one SDK and replacing it with another — across every service. Worse, each SDK only covered one or two signal types (tracing but not metrics, metrics but not logs).
The OpenTelemetry Promise
OpenTelemetry (OTel) merges OpenTracing and OpenCensus into a single, vendor-neutral standard. Its promise:
OTel is a CNCF incubating project (the 2nd most active CNCF project after Kubernetes) with SDKs for 11+ languages and broad vendor support.
OTel Architecture
Three Signals — Unified
OTel provides a unified framework for all three telemetry signals:
| Signal | OTel API | Data Model | Maturity |
|---|---|---|---|
| Traces | TracerProvider, Tracer, Span | Spans with attributes, events, links | Stable |
| Metrics | MeterProvider, Meter, Counter, Histogram | Counters, gauges, histograms | Stable |
| Logs | LoggerProvider, Logger | Log records with trace context | Stable |
The key innovation: all three signals share the same context propagation system. A trace ID generated in a span is automatically available in the logger, so log entries include trace context without any extra code.
Core Components
flowchart TD
subgraph Application
A[OTel API\nVendor-neutral interfaces] --> B[OTel SDK\nConfiguration + Processing]
C[Auto-Instrumentation\nLibrary hooks] --> A
end
B -->|OTLP| D[OTel Collector\nReceive → Process → Export]
D -->|Prometheus remote_write| E[Prometheus / Mimir]
D -->|OTLP| F[Tempo / Jaeger]
D -->|OTLP| G[Loki]
D -->|Vendor API| H[Datadog / New Relic / Splunk]
| Component | Role | Where It Runs |
|---|---|---|
| OTel API | Vendor-neutral interfaces for creating spans, metrics, logs | Application code |
| OTel SDK | Implementation of the API; configures exporters, processors, samplers | Application runtime |
| Auto-Instrumentation | Automatically instruments common libraries (HTTP, DB, gRPC) without code changes | Application runtime |
| OTLP | OpenTelemetry Protocol — the wire format for transmitting telemetry data | Network (gRPC or HTTP) |
| OTel Collector | Receives, processes (filter, enrich, sample), and exports telemetry to backends | Sidecar or DaemonSet |
OTLP — The Universal Wire Protocol
OTLP (OpenTelemetry Protocol) is a general-purpose telemetry data delivery protocol. It supports gRPC and HTTP/protobuf transports. OTLP is now the recommended protocol for transmitting telemetry from applications to backends.
# OTLP endpoints
# gRPC (default port 4317):
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
# HTTP/protobuf (default port 4318):
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318
# Signal-specific endpoints:
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://otel-collector:4317
OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=http://otel-collector:4317
OTEL_EXPORTER_OTLP_LOGS_ENDPOINT=http://otel-collector:4317
SDK & Manual Instrumentation
Python Setup — Complete Working Example
# Install OTel packages:
# pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp
from opentelemetry import trace, metrics
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.sdk.resources import Resource
# 1. Define service identity
resource = Resource.create({
"service.name": "order-service",
"service.version": "2.4.1",
"deployment.environment": "production"
})
# 2. Configure tracing
tracer_provider = TracerProvider(resource=resource)
tracer_provider.add_span_processor(
BatchSpanProcessor(OTLPSpanExporter(endpoint="http://otel-collector:4317"))
)
trace.set_tracer_provider(tracer_provider)
# 3. Configure metrics
metric_reader = PeriodicExportingMetricReader(
OTLPMetricExporter(endpoint="http://otel-collector:4317"),
export_interval_millis=10000 # Export every 10 seconds
)
meter_provider = MeterProvider(resource=resource, metric_readers=[metric_reader])
metrics.set_meter_provider(meter_provider)
# 4. Get tracer and meter
tracer = trace.get_tracer("order-service")
meter = metrics.get_meter("order-service")
print("OpenTelemetry configured successfully")
Creating Custom Spans
from opentelemetry import trace
tracer = trace.get_tracer("order-service")
def process_order(order_id, items):
# Create a span for the entire order processing
with tracer.start_as_current_span("process_order") as span:
span.set_attribute("order.id", order_id)
span.set_attribute("order.item_count", len(items))
# Nested span for validation
with tracer.start_as_current_span("validate_order") as validate_span:
validate_span.set_attribute("validation.rules_checked", 5)
is_valid = validate_items(items)
validate_span.set_attribute("validation.passed", is_valid)
# Nested span for payment
with tracer.start_as_current_span("charge_payment") as payment_span:
payment_span.set_attribute("payment.method", "credit_card")
total = sum(item["price"] for item in items)
payment_span.set_attribute("payment.amount_usd", total)
try:
charge_result = process_payment(total)
payment_span.set_attribute("payment.status", "success")
except Exception as e:
payment_span.set_status(
trace.Status(trace.StatusCode.ERROR, str(e))
)
payment_span.record_exception(e)
raise
span.add_event("order_completed", {
"order.id": order_id,
"order.total": total
})
return {"status": "completed", "order_id": order_id}
# Placeholder functions for the example
def validate_items(items):
return True
def process_payment(amount):
return {"charged": amount}
# Example usage
result = process_order("ORD-123", [{"name": "Widget", "price": 29.99}])
print(result)
Creating Custom Metrics
from opentelemetry import metrics
meter = metrics.get_meter("order-service")
# Counter — tracks cumulative totals
orders_counter = meter.create_counter(
name="orders_total",
description="Total number of orders processed",
unit="1"
)
# Histogram — tracks distributions (latency, sizes)
order_duration = meter.create_histogram(
name="order_processing_duration_ms",
description="Time to process an order in milliseconds",
unit="ms"
)
# Up-Down Counter — tracks values that go up and down
active_orders = meter.create_up_down_counter(
name="active_orders",
description="Number of orders currently being processed",
unit="1"
)
# Usage in application code
import time
def process_order_with_metrics(order_id, items):
active_orders.add(1, {"order.type": "standard"})
start = time.time()
try:
# ... process order ...
orders_counter.add(1, {
"order.status": "success",
"order.type": "standard"
})
result = {"status": "completed", "order_id": order_id}
return result
except Exception:
orders_counter.add(1, {
"order.status": "failed",
"order.type": "standard"
})
raise
finally:
duration_ms = (time.time() - start) * 1000
order_duration.record(duration_ms, {"order.type": "standard"})
active_orders.add(-1, {"order.type": "standard"})
result = process_order_with_metrics("ORD-456", [{"name": "Gadget", "price": 49.99}])
print(result)
Auto-Instrumentation — Zero Code Changes
OTel auto-instrumentation automatically hooks into popular libraries (HTTP clients, database drivers, web frameworks) and generates traces and metrics without you writing any instrumentation code.
# Python: Install auto-instrumentation packages
pip install opentelemetry-distro opentelemetry-exporter-otlp
opentelemetry-bootstrap -a install
# Run your app with auto-instrumentation:
opentelemetry-instrument \
--service_name order-service \
--exporter_otlp_endpoint http://otel-collector:4317 \
python app.py
# This automatically instruments: Flask, Django, FastAPI, requests,
# urllib3, psycopg2, pymongo, redis, grpcio, and 40+ more libraries
What Gets Instrumented Automatically
With auto-instrumentation enabled, OTel automatically creates spans for:
- Inbound HTTP requests: Every request to your Flask/Django/FastAPI app → server span with HTTP method, status, route
- Outbound HTTP requests: Every call via requests/urllib3/httpx → client span with target URL, status
- Database queries: Every query via psycopg2/pymongo/mysql-connector → span with SQL statement, DB name
- Redis operations: Every GET/SET/DEL → span with Redis command
- gRPC calls: Every inbound/outbound gRPC call → span with service/method
- Message queue operations: Kafka produce/consume, RabbitMQ publish/consume
All of this happens without writing a single line of instrumentation code. You just add the auto-instrumentation agent and configure the exporter.
The OTel Collector
Collector Architecture
The OTel Collector is a vendor-agnostic telemetry pipeline that receives, processes, and exports telemetry data. It sits between your applications and your backends, providing a centralised point for transformation, filtering, and routing.
flowchart LR
subgraph Receivers
A[OTLP\nPort 4317/4318]
B[Prometheus\nScrape targets]
C[Jaeger\nPort 14250]
end
subgraph Processors
D[Batch\nGroup for efficiency]
E[Filter\nDrop unwanted data]
F[Attributes\nEnrich metadata]
G[Tail Sampling\nKeep interesting traces]
end
subgraph Exporters
H[OTLP → Tempo]
I[Prometheus\nremote_write → Mimir]
J[Loki → Logs]
end
A & B & C --> D --> E --> F --> G --> H & I & J
Collector Configuration
# otel-collector-config.yaml — Production-ready configuration
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
# Scrape Prometheus metrics from applications
prometheus:
config:
scrape_configs:
- job_name: 'otel-collector'
scrape_interval: 15s
static_configs:
- targets: ['0.0.0.0:8888'] # Collector's own metrics
processors:
# Batch telemetry for efficient export
batch:
send_batch_size: 1024
send_batch_max_size: 2048
timeout: 5s
# Add resource attributes to all telemetry
resource:
attributes:
- key: deployment.environment
value: production
action: upsert
- key: k8s.cluster.name
value: prod-us-east-1
action: upsert
# Filter out noisy spans (e.g., health checks)
filter/traces:
error_mode: ignore
traces:
span:
- 'attributes["http.route"] == "/health"'
- 'attributes["http.route"] == "/readyz"'
# Memory limiter to prevent OOM
memory_limiter:
check_interval: 1s
limit_mib: 1024
spike_limit_mib: 256
exporters:
# Export traces to Tempo
otlp/tempo:
endpoint: tempo.monitoring.svc.cluster.local:4317
tls:
insecure: true
# Export metrics to Prometheus/Mimir
prometheusremotewrite:
endpoint: http://mimir.monitoring.svc.cluster.local:9009/api/v1/push
# Export logs to Loki
loki:
endpoint: http://loki.monitoring.svc.cluster.local:3100/loki/api/v1/push
# Debug exporter for troubleshooting
debug:
verbosity: basic
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, filter/traces, resource, batch]
exporters: [otlp/tempo]
metrics:
receivers: [otlp, prometheus]
processors: [memory_limiter, resource, batch]
exporters: [prometheusremotewrite]
logs:
receivers: [otlp]
processors: [memory_limiter, resource, batch]
exporters: [loki]
telemetry:
logs:
level: info
metrics:
address: 0.0.0.0:8888
Production Deployment Patterns
Pattern 1: Agent + Gateway
Run a lightweight OTel Collector as a DaemonSet (one per node) that forwards to a central Gateway Collector for processing and export. This reduces per-application configuration and provides a single choke point for sampling decisions.
Pattern 2: Sidecar
Run an OTel Collector as a sidecar container in each pod. This provides isolation between tenants in multi-tenant systems and allows per-service export configuration. Higher resource cost than DaemonSet.
Pattern 3: Direct Export
Applications export directly to backends (no Collector). Simpler architecture but loses the benefits of centralised processing, filtering, and sampling. Only suitable for small deployments or development environments.
Conclusion & Next Steps
OpenTelemetry is the future of observability instrumentation. Key takeaways from Part 6:
- Instrument once, export anywhere: OTel decouples instrumentation from backends — switch observability vendors by changing config, not code
- Three unified signals: Traces, metrics, and logs share context propagation — log entries automatically include trace IDs
- Auto-instrumentation covers 40+ libraries per language with zero code changes; combine with manual instrumentation for business logic
- OTLP is the universal wire protocol — every major backend now supports it
- The OTel Collector is a vendor-agnostic pipeline for receiving, processing, and exporting telemetry
- Agent + Gateway is the recommended production deployment pattern