Vectors & Selectors
Instant Vectors
An instant vector returns the most recent sample for each matching time series at a single point in time. This is what you get when you type a metric name into the Prometheus expression browser:
# Instant vector - returns one sample per series at query evaluation time
http_requests_total
# Returns:
# http_requests_total{method="GET", handler="/api/users", status="200"} 48923
# http_requests_total{method="POST", handler="/api/users", status="201"} 12847
# http_requests_total{method="GET", handler="/api/orders", status="200"} 95123
# With label selectors (filtering)
http_requests_total{status="500"}
# Returns only series where status="500"
# Instant vector with time offset (look 1 hour ago)
http_requests_total offset 1h
# Returns the most recent sample from 1 hour before query evaluation time
# Instant vector with @ modifier (specific timestamp)
http_requests_total @ 1718467200
# Returns the value at exactly that Unix timestamp
Range Vectors
A range vector returns a set of samples over a time window for each series. Range vectors cannot be graphed directly — they must be passed to a function like rate() or avg_over_time():
# Range vector - returns all samples within the last 5 minutes
http_requests_total{status="500"}[5m]
# Returns:
# http_requests_total{status="500", method="GET"} [
# 1718467000 123
# 1718467015 125
# 1718467030 125
# 1718467045 127
# ...
# ]
# Range vector MUST be passed to a function:
rate(http_requests_total{status="500"}[5m]) # ✅ Per-second rate over 5m window
avg_over_time(node_cpu_seconds_total[1h]) # ✅ Average over 1 hour
max_over_time(process_resident_memory_bytes[6h]) # ✅ Peak memory in last 6 hours
count_over_time(up[1d]) # ✅ Number of samples in 1 day
# CANNOT graph a range vector directly:
http_requests_total[5m] # ❌ Error: cannot use range vector in output
flowchart LR
subgraph IV["Instant Vector"]
direction TB
T1["Query Time: now"]
S1["Series A → 42.0
Series B → 18.5
Series C → 7.3"]
T1 --> S1
end
subgraph RV["Range Vector [5m]"]
direction TB
T2["Window: now-5m → now"]
S2["Series A → [40, 41, 41, 42, 42]
Series B → [17, 18, 18, 18, 18]
Series C → [6, 6, 7, 7, 7]"]
T2 --> S2
end
subgraph Fn["After Function"]
direction TB
T3["rate() applied"]
S3["Series A → 0.067/s
Series B → 0.033/s
Series C → 0.033/s"]
T3 --> S3
end
RV -->|"rate()"| Fn
Label Matchers
# Four types of label matchers:
# Equality matcher (=) - exact match
http_requests_total{method="GET"}
# Negative equality matcher (!=) - exclude
http_requests_total{status!="200"}
# Regex matcher (=~) - RE2 regex
http_requests_total{handler=~"/api/.*"}
http_requests_total{status=~"5.."} # All 5xx status codes
http_requests_total{method=~"GET|POST"} # GET or POST
# Negative regex matcher (!~) - exclude matching regex
http_requests_total{handler!~"/health|/ready"}
# Combining matchers (AND logic)
http_requests_total{method="GET", status=~"5..", handler=~"/api/.*"}
# The __name__ matcher - select metric name by regex
{__name__=~"http_request.*"} # All metrics starting with http_request
{__name__=~".*_total", job="api-server"} # All counters from api-server
Aggregation Operators
sum, avg, count, min, max
# sum - total across all matching series
sum(rate(http_requests_total[5m]))
# → single value: total request rate across all instances/methods/handlers
# sum by (label) - total grouped by specific label(s)
sum by (method) (rate(http_requests_total[5m]))
# → {method="GET"} 245.3
# → {method="POST"} 89.7
# avg - arithmetic mean across series
avg(rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m]))
# → average latency across all instances
# count - number of series matching
count(up{job="api-server"} == 1)
# → number of healthy api-server instances
# min / max - extreme values
max by (instance) (process_resident_memory_bytes{job="api-server"})
# → peak memory per instance
topk, bottomk, quantile
# topk - top N series by value
topk(5, rate(http_requests_total[5m]))
# → 5 busiest series (endpoint/method/status combinations)
# bottomk - bottom N series
bottomk(3, rate(http_requests_total[5m]))
# → 3 least active series
# quantile - statistical quantile across series (NOT over time)
quantile(0.95, rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m]))
# → the value at the 95th percentile across all series
# NOTE: This is across series, NOT the P95 latency! Use histogram_quantile for P95 latency
# count_values - count occurrences of each value
count_values("version", build_info{job="api-server"})
# → {"version"="v2.3.1"} 5
# → {"version"="v2.3.0"} 2 (still on old version!)
# group - returns 1 for each unique label set (useful for joins)
group by (namespace, pod) (kube_pod_info)
by vs without Clauses
by when you want to keep specific dimensions. Use without when you want to remove specific dimensions (keeping everything else). without is more resilient to new labels being added — it won’t accidentally create new groupings.
# by (keep only these labels) - explicit inclusion
sum by (namespace, service) (rate(http_requests_total[5m]))
# without (remove these labels) - explicit exclusion
sum without (instance, pod) (rate(http_requests_total[5m]))
# These produce the SAME result if all labels are:
# {namespace, service, instance, pod, method, status}
# "by (namespace, service)" = "without (instance, pod, method, status)"
# Prefer "without" for alerting rules - more resilient to label changes:
# If someone adds a "version" label tomorrow:
sum by (namespace, service) (rate(http_requests_total[5m]))
# ↑ Still works, but doesn't aggregate across versions (may miss errors!)
sum without (instance, pod) (rate(http_requests_total[5m]))
# ↑ Still works AND automatically includes the new version dimension
Binary Operators & Vector Matching
Arithmetic Operators
# Division - calculate percentages
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100
# → Memory usage percentage per node
# Multiplication with scalar
rate(http_requests_total[5m]) * 60
# → Requests per minute (rate gives per-second, multiply by 60)
# Subtraction - calculate free resources
node_filesystem_size_bytes - node_filesystem_free_bytes
# → Used disk space per filesystem
Comparison & Filtering
# Comparison operators filter results (drop non-matching series)
process_resident_memory_bytes > 1e9
# → Only series where memory exceeds 1 GB
# With bool modifier - returns 0/1 instead of filtering
process_resident_memory_bytes > bool 1e9
# → 1 if over threshold, 0 if under (useful for alerting math)
# Common alerting patterns:
rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05
# → Error rate exceeds 5%
# unless - set difference (anti-join)
up == 1 unless on(instance) (node_load1 > 10)
# → Instances that are up AND don't have high load
on/ignoring & group_left/group_right
Vector matching is required when performing binary operations between vectors with different label sets. This is one of PromQL’s most powerful — and most confusing — features:
# One-to-one matching (default): labels must match exactly
# This FAILS if labels don't match between left and right:
rate(http_requests_total[5m]) / rate(http_requests_total[5m] offset 1h)
# ↑ Works because both sides have identical label sets
# "on" - match only on specified labels (ignore all others)
sum by (namespace) (rate(http_requests_total{status=~"5.."}[5m]))
/
sum by (namespace) (rate(http_requests_total[5m]))
# → Error rate per namespace (both sides aggregated to same labels)
# "ignoring" - match on all labels EXCEPT specified ones
# Example: ratio of 5xx to total, ignoring the status label difference
rate(http_requests_total{status=~"5.."}[5m])
/ ignoring(status)
sum without(status) (rate(http_requests_total[5m]))
# "group_left" / "group_right" - many-to-one matching
# Use when one side has MORE series than the other (enrichment/joins)
# Example: Add pod owner (deployment/statefulset) info to memory metrics
container_memory_usage_bytes{container!=""}
* on(namespace, pod) group_left(owner_name)
kube_pod_owner{owner_kind="ReplicaSet"}
# → Memory usage with owner_name label added from kube_pod_owner
# Example: Add human-readable service name to metrics
rate(http_requests_total[5m])
* on(instance) group_left(service_name)
label_replace(service_info, "instance", "$1", "address", "(.*)")
flowchart TD
subgraph Left["Left (many): container_memory_usage_bytes"]
L1["{namespace=prod, pod=api-abc} 256MB"]
L2["{namespace=prod, pod=api-def} 312MB"]
L3["{namespace=prod, pod=worker-1} 128MB"]
end
subgraph Right["Right (one): kube_pod_owner"]
R1["{namespace=prod, pod=api-abc, owner=api-deploy}"]
R2["{namespace=prod, pod=api-def, owner=api-deploy}"]
R3["{namespace=prod, pod=worker-1, owner=worker-ss}"]
end
subgraph Result["Result (enriched)"]
O1["{namespace=prod, pod=api-abc, owner=api-deploy} 256MB"]
O2["{namespace=prod, pod=api-def, owner=api-deploy} 312MB"]
O3["{namespace=prod, pod=worker-1, owner=worker-ss} 128MB"]
end
L1 & R1 -->|"match on(namespace,pod)
group_left(owner)"| O1
L2 & R2 --> O2
L3 & R3 --> O3
Essential Functions
rate() vs irate() vs increase()
# rate() - average per-second increase over the range window
# Best for: alerting, recording rules, general monitoring
# Smooths out spikes, handles counter resets
rate(http_requests_total[5m])
# → average requests per second over last 5 minutes
# irate() - per-second rate using ONLY the last two samples
# Best for: real-time dashboards showing instantaneous spikes
# Very spiky, NOT suitable for alerting (misses brief periods)
irate(http_requests_total[5m])
# → instantaneous rate between last two scrapes (e.g., 15s apart)
# increase() - total increase over the range window
# Syntactic sugar: increase(x[5m]) ≈ rate(x[5m]) * 300
increase(http_requests_total[1h])
# → total requests in the last hour (more intuitive for humans)
# CRITICAL: Range window must contain at least 4 samples for rate()
# With 15s scrape interval:
rate(x[30s]) # ❌ Only 2 samples - unreliable
rate(x[1m]) # ⚠️ 4 samples - minimum, sensitive to missed scrapes
rate(x[5m]) # ✅ 20 samples - good default
rate(x[15m]) # ✅ Smoothest, but slower to detect changes
4 × scrape_interval. With a 15s scrape interval, the minimum useful range is [1m]. With a 30s interval, use at least [2m]. Smaller windows may produce no results if scrapes are missed or delayed.
histogram_quantile()
# histogram_quantile() - calculate percentiles from histogram buckets
# P99 latency across all instances:
histogram_quantile(0.99,
sum by (le) (rate(http_request_duration_seconds_bucket[5m]))
)
# P50 (median) latency per service:
histogram_quantile(0.50,
sum by (le, service) (rate(http_request_duration_seconds_bucket[5m]))
)
# IMPORTANT: The "le" label MUST be preserved in the aggregation!
# This is WRONG:
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
# ↑ Without aggregation, gives P99 per instance - usually what you want for alerts
# Multiple percentiles for dashboard:
histogram_quantile(0.50, sum by (le) (rate(http_request_duration_seconds_bucket[5m]))) # P50
histogram_quantile(0.90, sum by (le) (rate(http_request_duration_seconds_bucket[5m]))) # P90
histogram_quantile(0.99, sum by (le) (rate(http_request_duration_seconds_bucket[5m]))) # P99
# Apdex score from histogram:
(
sum(rate(http_request_duration_seconds_bucket{le="0.3"}[5m])) # Satisfied (< 300ms)
+
sum(rate(http_request_duration_seconds_bucket{le="1.2"}[5m])) # Tolerating (< 1.2s)
) / 2 / sum(rate(http_request_duration_seconds_count[5m]))
predict_linear() & deriv()
# predict_linear() - linear regression to predict future values
# "Will disk fill up in the next 4 hours?"
predict_linear(node_filesystem_avail_bytes{mountpoint="/"}[6h], 4*3600) < 0
# Uses last 6h of data to predict value in 4h; alert if negative (full)
# "When will disk fill up?" (time until zero)
node_filesystem_avail_bytes / (- deriv(node_filesystem_avail_bytes[1h]) + 0.0001)
# → seconds until disk is full (based on last hour's trend)
# deriv() - per-second derivative of a gauge
deriv(process_resident_memory_bytes[1h])
# → rate of memory growth (bytes/second) - detect memory leaks!
# Positive = growing, Negative = shrinking, ~0 = stable
# delta() - difference between first and last sample in range
delta(process_resident_memory_bytes[1h])
# → total memory change over last hour
absent() & absent_over_time()
# absent() - returns 1 if NO series exist for the selector
# Critical for detecting missing targets or broken instrumentation
absent(up{job="payment-service"})
# → 1 if payment-service has no "up" metric (service not scraped!)
# absent_over_time() - returns 1 if no samples in range window
absent_over_time(http_requests_total{job="critical-api"}[5m])
# → 1 if critical-api had zero scrapes in last 5 minutes
# Common alerting patterns:
# Alert if a critical job disappears:
absent(up{job="payment-service"} == 1)
# → fires if payment-service is either down OR missing entirely
Subqueries
# Subqueries evaluate an instant query over a range, creating a range vector
# Syntax: <instant_query>[range:resolution]
# Maximum request rate over the last hour, sampled every minute:
max_over_time(rate(http_requests_total[5m])[1h:1m])
# Inner: rate(http_requests_total[5m]) evaluated at each 1m step over last 1h
# Outer: max across all those 60 evaluations
# Minimum number of healthy pods over last 24 hours:
min_over_time(count(up{job="api-server"} == 1)[24h:5m])
# → smallest fleet size in the last day (capacity planning)
# Detect sustained high latency (P99 > 500ms for more than 50% of last hour):
count_over_time((histogram_quantile(0.99, sum by (le) (rate(http_request_duration_seconds_bucket[5m]))) > 0.5)[1h:1m]) > 30
# → Alert only if P99 exceeded 500ms for >30 out of 60 minutes
Recording Rules
Rule Syntax & Best Practices
Recording rules pre-compute expensive queries and store results as new time series. They are essential for dashboard performance and creating aggregated metrics for alerting:
# recording-rules.yaml
# PrometheusRule CRD for Kubernetes
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: http-recording-rules
namespace: monitoring
labels:
release: prometheus-stack
spec:
groups:
- name: http.rules
interval: 30s # Evaluation interval (default: global eval interval)
rules:
# Naming convention: level:metric:operations
# level = aggregation level (job, namespace, cluster)
# metric = original metric name
# operations = what was done (rate5m, sum, etc.)
- record: job:http_requests_total:rate5m
expr: sum by (job) (rate(http_requests_total[5m]))
labels:
aggregation: "job_level"
- record: job:http_request_duration_seconds:p99_5m
expr: |
histogram_quantile(0.99,
sum by (job, le) (rate(http_request_duration_seconds_bucket[5m]))
)
- record: namespace:http_errors_total:rate5m
expr: |
sum by (namespace) (
rate(http_requests_total{status=~"5.."}[5m])
)
- record: namespace:http_error_ratio:rate5m
expr: |
namespace:http_errors_total:rate5m
/
sum by (namespace) (rate(http_requests_total[5m]))
Production Rule Examples
# More production recording rules
spec:
groups:
- name: kubernetes.resource.rules
rules:
# CPU usage ratio per pod
- record: pod:container_cpu_usage_seconds_total:rate5m
expr: |
sum by (namespace, pod) (
rate(container_cpu_usage_seconds_total{container!="", image!=""}[5m])
)
# Memory usage per pod (excludes pause containers)
- record: pod:container_memory_working_set_bytes:sum
expr: |
sum by (namespace, pod) (
container_memory_working_set_bytes{container!="", image!=""}
)
# CPU request utilization (actual / requested)
- record: pod:cpu_utilization_ratio:rate5m
expr: |
pod:container_cpu_usage_seconds_total:rate5m
/ on(namespace, pod) group_left()
sum by (namespace, pod) (kube_pod_container_resource_requests{resource="cpu"})
Production Query Patterns
RED Method Queries
Rate, Errors, Duration — For Request-Driven Services
| Signal | PromQL Query | Dashboard Panel |
|---|---|---|
| Rate | sum by (service) (rate(http_requests_total[5m])) | Requests/sec graph |
| Errors | sum by (service) (rate(http_requests_total{status=~"5.."}[5m])) / sum by (service) (rate(http_requests_total[5m])) | Error % graph |
| Duration (P50) | histogram_quantile(0.50, sum by (le, service) (rate(http_request_duration_seconds_bucket[5m]))) | Latency graph |
| Duration (P99) | histogram_quantile(0.99, sum by (le, service) (rate(http_request_duration_seconds_bucket[5m]))) | Latency graph |
USE Method Queries
# USE Method: Utilization, Saturation, Errors — For Resources
# CPU Utilization (per node)
1 - avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m]))
# CPU Saturation (runnable processes waiting for CPU)
node_load1 / count by (instance) (node_cpu_seconds_total{mode="idle"})
# Memory Utilization
1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)
# Memory Saturation (major page faults / OOM events)
rate(node_vmstat_pgmajfault[5m])
# Disk Utilization
1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})
# Disk Saturation (I/O wait time)
rate(node_disk_io_time_weighted_seconds_total[5m])
# Network Utilization (as % of link speed - requires knowing link_speed)
rate(node_network_transmit_bytes_total{device="eth0"}[5m]) * 8 / 1e9 # Gbps
# Network Errors
rate(node_network_receive_errs_total[5m]) + rate(node_network_transmit_errs_total[5m])
PromQL Gotchas & Anti-Patterns
- rate() on a gauge:
rate(temperature_celsius[5m])— rate is for counters only! Usederiv()for gauges - sum without le:
histogram_quantile(0.99, sum(rate(x_bucket[5m])))— you MUST keeplein the aggregation - rate with too-short window:
rate(x[15s])with 15s scrape — only 1 sample, useless! - Aggregating summaries:
avg(go_gc_duration_seconds{quantile="0.99"})— averaging quantiles is statistically meaningless - High-cardinality regex:
{__name__=~".+"}— matches ALL metrics, extremely expensive - Unnecessary subqueries:
max_over_time(x[1h:1m])whenmax_over_time(x[1h])works — subqueries should only be used when the inner expression is itself a function
Conclusion & What’s Next
PromQL is a purpose-built language for time series analysis. Its key concepts:
- Instant vectors for current state, range vectors for windowed analysis
rate()for counters, direct values for gauges,histogram_quantile()for percentiles- Aggregation with
by/withoutcontrols dimensional grouping group_left/group_rightenables many-to-one enrichment joins- Recording rules pre-compute expensive queries for dashboard performance
- RED method for services, USE method for resources
Next in the Series
In Part 5: Service Discovery, we’ll explore how Prometheus dynamically finds scrape targets — from Kubernetes SD roles and relabeling to building custom HTTP-based service discovery providers.