Service Discovery Overview
Static vs Dynamic
Prometheus needs to know where to scrape metrics. In static environments, you list targets manually. In dynamic environments (Kubernetes, cloud, service mesh), targets appear and disappear constantly — requiring automated discovery:
# Static targets - suitable for fixed infrastructure
scrape_configs:
- job_name: 'database-servers'
static_configs:
- targets:
- 'db-primary.internal:9104'
- 'db-replica-1.internal:9104'
- 'db-replica-2.internal:9104'
labels:
env: production
team: data
# Dynamic targets - Kubernetes discovers pods automatically
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
# Relabeling filters and transforms discovered targets
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
Target Lifecycle
flowchart TD
SD["Service Discovery
(watches K8s API, Consul, etc.)"]
TG["Target Groups
(raw targets + __meta labels)"]
RL["relabel_configs
(filter, transform, drop)"]
AT["Active Targets
(final scrape list)"]
SC["Scrape Loop
(GET /metrics every interval)"]
MR["metric_relabel_configs
(filter/modify scraped metrics)"]
TSDB["TSDB Storage"]
SD -->|"discover"| TG
TG -->|"apply"| RL
RL -->|"keep/drop"| AT
AT -->|"scrape"| SC
SC -->|"post-process"| MR
MR -->|"ingest"| TSDB
RL -->|"action: drop"| X1["Discarded
(never scraped)"]
MR -->|"action: drop"| X2["Discarded
(scraped but not stored)"]
Kubernetes Service Discovery
SD Roles
Kubernetes SD Roles
| Role | Discovers | Target Address | Use Case |
|---|---|---|---|
| node | Cluster nodes | NodeIP:KubeletPort | Node exporter, kubelet metrics |
| pod | All pods (all containers) | PodIP:ContainerPort | Application metrics (annotation-based) |
| service | Services | ServiceIP:ServicePort | Blackbox probing service endpoints |
| endpoints | Endpoints (pods behind services) | PodIP:TargetPort | ServiceMonitor pattern (most common) |
| endpointslice | EndpointSlices (modern) | PodIP:TargetPort | Same as endpoints, better scaling |
| ingress | Ingress resources | IngressHost:80/443 | Blackbox probing external URLs |
__meta_kubernetes Labels
Kubernetes SD injects rich metadata as __meta_kubernetes_* labels. These are available during relabeling but are NOT stored in TSDB (double-underscore labels are dropped after relabeling):
# Key __meta labels for pod role:
__meta_kubernetes_pod_name # Pod name
__meta_kubernetes_pod_ip # Pod IP address
__meta_kubernetes_pod_node_name # Node the pod runs on
__meta_kubernetes_namespace # Pod namespace
__meta_kubernetes_pod_label_<name> # Pod labels (dots → underscores)
__meta_kubernetes_pod_annotation_<name> # Pod annotations
__meta_kubernetes_pod_container_name # Container name
__meta_kubernetes_pod_container_port_name # Named port
__meta_kubernetes_pod_container_port_number # Port number
__meta_kubernetes_pod_ready # "true" if pod is ready
__meta_kubernetes_pod_phase # Running, Pending, etc.
# Key __meta labels for node role:
__meta_kubernetes_node_name # Node name
__meta_kubernetes_node_label_<name> # Node labels
__meta_kubernetes_node_address_InternalIP # Internal IP
__meta_kubernetes_node_address_Hostname # Hostname
Practical Examples
# Pattern 1: Annotation-based pod discovery (classic Prometheus pattern)
# Pods opt-in by adding annotations:
# prometheus.io/scrape: "true"
# prometheus.io/port: "8080"
# prometheus.io/path: "/metrics"
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
# Only scrape pods with prometheus.io/scrape annotation
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
# Use custom port if specified
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: (.+)
replacement: ${1}
# Use custom path if specified
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
# Add namespace and pod labels
- source_labels: [__meta_kubernetes_namespace]
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
target_label: pod
- source_labels: [__meta_kubernetes_pod_label_app]
target_label: app
Relabeling Deep Dive
relabel_configs vs metric_relabel_configs
- relabel_configs — runs BEFORE scraping. Decides which targets to scrape and sets target labels. Operates on
__meta_*and__address__labels. Dropping here = target never scraped = saves CPU - metric_relabel_configs — runs AFTER scraping. Filters or modifies individual metrics from the scrape response. Dropping here = metric scraped but not stored = saves disk/memory but still costs scrape CPU
Actions: keep, drop, replace, labelmap, hashmod
# Action: keep - only keep targets where regex matches
relabel_configs:
- source_labels: [__meta_kubernetes_namespace]
action: keep
regex: (production|staging) # Only monitor prod and staging
# Action: drop - discard targets where regex matches
- source_labels: [__meta_kubernetes_pod_label_app]
action: drop
regex: (test-.*|debug-.*) # Skip test/debug pods
# Action: replace - regex extract and set target label
- source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
action: replace
target_label: app
regex: (.+)
replacement: ${1}
# Action: labelmap - copy __meta labels to target labels by regex
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
replacement: ${1} # All pod labels become target labels
# Action: hashmod - consistent hashing for sharding
- source_labels: [__address__]
modulus: 3 # 3 Prometheus shards
target_label: __tmp_hash
action: hashmod
- source_labels: [__tmp_hash]
regex: 0 # This shard handles bucket 0
action: keep
# Action: labeldrop / labelkeep - remove/keep labels by regex
metric_relabel_configs:
- action: labeldrop
regex: (pod_template_hash|controller_revision_hash) # Remove noisy labels
Common Relabeling Patterns
# Pattern: Replace __address__ port (scrape different port than discovered)
relabel_configs:
- source_labels: [__address__]
action: replace
regex: '([^:]+)(?::\d+)?'
replacement: '${1}:9090'
target_label: __address__
# Pattern: Set metrics_path from annotation
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
# Pattern: Add static label to all metrics from a job
- target_label: environment
replacement: production
# Pattern: Extract version from pod name (api-v2-abc123 → v2)
- source_labels: [__meta_kubernetes_pod_name]
action: replace
regex: '.*-(v\d+)-.*'
replacement: '${1}'
target_label: version
Other SD Mechanisms
Consul SD
# Consul service discovery - for non-Kubernetes environments
scrape_configs:
- job_name: 'consul-services'
consul_sd_configs:
- server: 'consul.internal:8500'
services: [] # Empty = discover ALL services
tags: ['prometheus'] # Only services with this tag
refresh_interval: 30s
relabel_configs:
- source_labels: [__meta_consul_service]
target_label: service
- source_labels: [__meta_consul_dc]
target_label: datacenter
- source_labels: [__meta_consul_tags]
regex: '.*,production,.*'
action: keep
DNS SD
# DNS service discovery - SRV or A records
scrape_configs:
- job_name: 'dns-discovery'
dns_sd_configs:
- names:
- '_prometheus._tcp.api.internal' # SRV record
- 'metrics.api.internal' # A record
type: SRV # or A, AAAA
port: 9090 # Required for A/AAAA records
refresh_interval: 30s
File-Based SD
# File-based SD - read targets from JSON/YAML files
# Prometheus watches files for changes (inotify)
scrape_configs:
- job_name: 'file-discovery'
file_sd_configs:
- files:
- '/etc/prometheus/targets/*.json'
- '/etc/prometheus/targets/*.yml'
refresh_interval: 5m
[
{
"targets": ["10.0.1.5:9090", "10.0.1.6:9090"],
"labels": {
"env": "production",
"team": "platform",
"service": "api-gateway"
}
},
{
"targets": ["10.0.2.10:9090"],
"labels": {
"env": "staging",
"team": "platform",
"service": "api-gateway"
}
}
]
HTTP SD (Custom Providers)
HTTP SD (Prometheus 2.28+) lets you build a custom service discovery provider as any HTTP server that returns target JSON. This is the most flexible approach for non-standard infrastructure:
# HTTP service discovery configuration
scrape_configs:
- job_name: 'custom-http-sd'
http_sd_configs:
- url: 'http://sd-provider.internal:8080/targets'
refresh_interval: 30s
# Optional authentication
authorization:
type: Bearer
credentials: 'my-secret-token'
// Simple HTTP SD provider (Node.js example)
// Returns JSON array of target groups (same format as file_sd)
const express = require('express');
const app = express();
app.get('/targets', async (req, res) => {
// Query your CMDB, cloud API, or custom registry
const instances = await getInstancesFromCMDB();
const targetGroups = instances.map(inst => ({
targets: [`${inst.ip}:${inst.metricsPort}`],
labels: {
__metrics_path__: inst.metricsPath || '/metrics',
service: inst.serviceName,
env: inst.environment,
region: inst.region,
owner: inst.team
}
}));
res.json(targetGroups);
});
app.listen(8080);
flowchart TD
Q1{"Where do targets live?"}
Q1 -->|Kubernetes| K8S["kubernetes_sd_configs
(native, recommended)"]
Q1 -->|Consul/Nomad| CON["consul_sd_configs"]
Q1 -->|AWS/GCP/Azure| CLOUD["ec2/gce/azure SD"]
Q1 -->|DNS SRV records| DNS["dns_sd_configs"]
Q1 -->|Fixed list| STATIC["static_configs"]
Q1 -->|Custom system| CUSTOM{"Can you write
an HTTP endpoint?"}
CUSTOM -->|Yes| HTTP["http_sd_configs"]
CUSTOM -->|No| FILE["file_sd_configs
(cron job writes JSON)"]
Conclusion & What’s Next
Service discovery is the bridge between Prometheus and your dynamic infrastructure. Key takeaways:
- Kubernetes SD with
endpointsrole is the most common production pattern relabel_configsruns before scraping (saves CPU);metric_relabel_configsruns after (saves storage)- The
keep/dropactions are your primary filtering tools hashmodenables sharding targets across multiple Prometheus instances- HTTP SD is the universal escape hatch for any custom infrastructure
Next in the Series
In Part 6: Effective Alerting & Alertmanager, we’ll turn PromQL queries into actionable alerts with proper routing, grouping, inhibition rules, and multi-channel notification.