Prometheus Deep Dive Part 4: Mastering PromQL

Vectors & Selectors

Instant Vectors

An instant vector returns the most recent sample for each matching time series at a single point in time. This is what you get when you type a metric name into the Prometheus expression browser:

# Instant vector - returns one sample per series at query evaluation time
http_requests_total
# Returns:
# http_requests_total{method="GET", handler="/api/users", status="200"}  48923
# http_requests_total{method="POST", handler="/api/users", status="201"} 12847
# http_requests_total{method="GET", handler="/api/orders", status="200"} 95123

# With label selectors (filtering)
http_requests_total{status="500"}
# Returns only series where status="500"

# Instant vector with time offset (look 1 hour ago)
http_requests_total offset 1h
# Returns the most recent sample from 1 hour before query evaluation time

# Instant vector with @ modifier (specific timestamp)
http_requests_total @ 1718467200
# Returns the value at exactly that Unix timestamp

Range Vectors

A range vector returns a set of samples over a time window for each series. Range vectors cannot be graphed directly — they must be passed to a function like rate() or avg_over_time():

# Range vector - returns all samples within the last 5 minutes
http_requests_total{status="500"}[5m]
# Returns:
# http_requests_total{status="500", method="GET"} [
#   1718467000 123
#   1718467015 125
#   1718467030 125
#   1718467045 127
#   ...
# ]

# Range vector MUST be passed to a function:
rate(http_requests_total{status="500"}[5m])    # ✅ Per-second rate over 5m window
avg_over_time(node_cpu_seconds_total[1h])       # ✅ Average over 1 hour
max_over_time(process_resident_memory_bytes[6h]) # ✅ Peak memory in last 6 hours
count_over_time(up[1d])                         # ✅ Number of samples in 1 day

# CANNOT graph a range vector directly:
http_requests_total[5m]                         # ❌ Error: cannot use range vector in output

Instant Vector vs Range Vector

flowchart LR
    subgraph IV["Instant Vector"]
        direction TB
        T1["Query Time: now"]
        S1["Series A → 42.0
Series B → 18.5
Series C → 7.3"]
        T1 --> S1
    end

    subgraph RV["Range Vector [5m]"]
        direction TB
        T2["Window: now-5m → now"]
        S2["Series A → [40, 41, 41, 42, 42]
Series B → [17, 18, 18, 18, 18]
Series C → [6, 6, 7, 7, 7]"]
        T2 --> S2
    end

    subgraph Fn["After Function"]
        direction TB
        T3["rate() applied"]
        S3["Series A → 0.067/s
Series B → 0.033/s
Series C → 0.033/s"]
        T3 --> S3
    end

    RV -->|"rate()"| Fn

Label Matchers

# Four types of label matchers:

# Equality matcher (=) - exact match
http_requests_total{method="GET"}

# Negative equality matcher (!=) - exclude
http_requests_total{status!="200"}

# Regex matcher (=~) - RE2 regex
http_requests_total{handler=~"/api/.*"}
http_requests_total{status=~"5.."}          # All 5xx status codes
http_requests_total{method=~"GET|POST"}     # GET or POST

# Negative regex matcher (!~) - exclude matching regex
http_requests_total{handler!~"/health|/ready"}

# Combining matchers (AND logic)
http_requests_total{method="GET", status=~"5..", handler=~"/api/.*"}

# The __name__ matcher - select metric name by regex
{__name__=~"http_request.*"}               # All metrics starting with http_request
{__name__=~".*_total", job="api-server"}   # All counters from api-server

Aggregation Operators

sum, avg, count, min, max

# sum - total across all matching series
sum(rate(http_requests_total[5m]))
# → single value: total request rate across all instances/methods/handlers

# sum by (label) - total grouped by specific label(s)
sum by (method) (rate(http_requests_total[5m]))
# → {method="GET"} 245.3
# → {method="POST"} 89.7

# avg - arithmetic mean across series
avg(rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m]))
# → average latency across all instances

# count - number of series matching
count(up{job="api-server"} == 1)
# → number of healthy api-server instances

# min / max - extreme values
max by (instance) (process_resident_memory_bytes{job="api-server"})
# → peak memory per instance

topk, bottomk, quantile

# topk - top N series by value
topk(5, rate(http_requests_total[5m]))
# → 5 busiest series (endpoint/method/status combinations)

# bottomk - bottom N series
bottomk(3, rate(http_requests_total[5m]))
# → 3 least active series

# quantile - statistical quantile across series (NOT over time)
quantile(0.95, rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m]))
# → the value at the 95th percentile across all series
# NOTE: This is across series, NOT the P95 latency! Use histogram_quantile for P95 latency

# count_values - count occurrences of each value
count_values("version", build_info{job="api-server"})
# → {"version"="v2.3.1"} 5
# → {"version"="v2.3.0"} 2  (still on old version!)

# group - returns 1 for each unique label set (useful for joins)
group by (namespace, pod) (kube_pod_info)

by vs without Clauses

                            
                            Rule of Thumb: Use by when you want to keep specific dimensions. Use without when you want to remove specific dimensions (keeping everything else). without is more resilient to new labels being added — it won’t accidentally create new groupings.
                        

# by (keep only these labels) - explicit inclusion
sum by (namespace, service) (rate(http_requests_total[5m]))

# without (remove these labels) - explicit exclusion
sum without (instance, pod) (rate(http_requests_total[5m]))

# These produce the SAME result if all labels are:
# {namespace, service, instance, pod, method, status}
# "by (namespace, service)" = "without (instance, pod, method, status)"

# Prefer "without" for alerting rules - more resilient to label changes:
# If someone adds a "version" label tomorrow:
sum by (namespace, service) (rate(http_requests_total[5m]))
# ↑ Still works, but doesn't aggregate across versions (may miss errors!)

sum without (instance, pod) (rate(http_requests_total[5m]))
# ↑ Still works AND automatically includes the new version dimension

Binary Operators & Vector Matching

Arithmetic Operators

# Division - calculate percentages
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100
# → Memory usage percentage per node

# Multiplication with scalar
rate(http_requests_total[5m]) * 60
# → Requests per minute (rate gives per-second, multiply by 60)

# Subtraction - calculate free resources
node_filesystem_size_bytes - node_filesystem_free_bytes
# → Used disk space per filesystem

Comparison & Filtering

# Comparison operators filter results (drop non-matching series)
process_resident_memory_bytes > 1e9
# → Only series where memory exceeds 1 GB

# With bool modifier - returns 0/1 instead of filtering
process_resident_memory_bytes > bool 1e9
# → 1 if over threshold, 0 if under (useful for alerting math)

# Common alerting patterns:
rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05
# → Error rate exceeds 5%

# unless - set difference (anti-join)
up == 1 unless on(instance) (node_load1 > 10)
# → Instances that are up AND don't have high load

on/ignoring & group_left/group_right

Vector matching is required when performing binary operations between vectors with different label sets. This is one of PromQL’s most powerful — and most confusing — features:

# One-to-one matching (default): labels must match exactly
# This FAILS if labels don't match between left and right:
rate(http_requests_total[5m]) / rate(http_requests_total[5m] offset 1h)
# ↑ Works because both sides have identical label sets

# "on" - match only on specified labels (ignore all others)
sum by (namespace) (rate(http_requests_total{status=~"5.."}[5m]))
/
sum by (namespace) (rate(http_requests_total[5m]))
# → Error rate per namespace (both sides aggregated to same labels)

# "ignoring" - match on all labels EXCEPT specified ones
# Example: ratio of 5xx to total, ignoring the status label difference
rate(http_requests_total{status=~"5.."}[5m])
/ ignoring(status)
sum without(status) (rate(http_requests_total[5m]))

# "group_left" / "group_right" - many-to-one matching
# Use when one side has MORE series than the other (enrichment/joins)

# Example: Add pod owner (deployment/statefulset) info to memory metrics
container_memory_usage_bytes{container!=""}
* on(namespace, pod) group_left(owner_name)
kube_pod_owner{owner_kind="ReplicaSet"}
# → Memory usage with owner_name label added from kube_pod_owner

# Example: Add human-readable service name to metrics
rate(http_requests_total[5m])
* on(instance) group_left(service_name)
label_replace(service_info, "instance", "$1", "address", "(.*)")

Vector Matching: group_left Example

flowchart TD
    subgraph Left["Left (many): container_memory_usage_bytes"]
        L1["{namespace=prod, pod=api-abc}  256MB"]
        L2["{namespace=prod, pod=api-def}  312MB"]
        L3["{namespace=prod, pod=worker-1} 128MB"]
    end

    subgraph Right["Right (one): kube_pod_owner"]
        R1["{namespace=prod, pod=api-abc, owner=api-deploy}"]
        R2["{namespace=prod, pod=api-def, owner=api-deploy}"]
        R3["{namespace=prod, pod=worker-1, owner=worker-ss}"]
    end

    subgraph Result["Result (enriched)"]
        O1["{namespace=prod, pod=api-abc, owner=api-deploy}  256MB"]
        O2["{namespace=prod, pod=api-def, owner=api-deploy}  312MB"]
        O3["{namespace=prod, pod=worker-1, owner=worker-ss}  128MB"]
    end

    L1 & R1 -->|"match on(namespace,pod)
group_left(owner)"| O1
    L2 & R2 --> O2
    L3 & R3 --> O3

Essential Functions

rate() vs irate() vs increase()

# rate() - average per-second increase over the range window
# Best for: alerting, recording rules, general monitoring
# Smooths out spikes, handles counter resets
rate(http_requests_total[5m])
# → average requests per second over last 5 minutes

# irate() - per-second rate using ONLY the last two samples
# Best for: real-time dashboards showing instantaneous spikes
# Very spiky, NOT suitable for alerting (misses brief periods)
irate(http_requests_total[5m])
# → instantaneous rate between last two scrapes (e.g., 15s apart)

# increase() - total increase over the range window
# Syntactic sugar: increase(x[5m]) ≈ rate(x[5m]) * 300
increase(http_requests_total[1h])
# → total requests in the last hour (more intuitive for humans)

# CRITICAL: Range window must contain at least 4 samples for rate()
# With 15s scrape interval:
rate(x[30s])   # ❌ Only 2 samples - unreliable
rate(x[1m])    # ⚠️  4 samples - minimum, sensitive to missed scrapes
rate(x[5m])    # ✅  20 samples - good default
rate(x[15m])   # ✅  Smoothest, but slower to detect changes

                            
                            The Rate Window Rule: Always use a range window of at least 4 × scrape_interval. With a 15s scrape interval, the minimum useful range is [1m]. With a 30s interval, use at least [2m]. Smaller windows may produce no results if scrapes are missed or delayed.
                        

histogram_quantile()

# histogram_quantile() - calculate percentiles from histogram buckets
# P99 latency across all instances:
histogram_quantile(0.99,
  sum by (le) (rate(http_request_duration_seconds_bucket[5m]))
)

# P50 (median) latency per service:
histogram_quantile(0.50,
  sum by (le, service) (rate(http_request_duration_seconds_bucket[5m]))
)

# IMPORTANT: The "le" label MUST be preserved in the aggregation!
# This is WRONG:
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
# ↑ Without aggregation, gives P99 per instance - usually what you want for alerts

# Multiple percentiles for dashboard:
histogram_quantile(0.50, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))  # P50
histogram_quantile(0.90, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))  # P90
histogram_quantile(0.99, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))  # P99

# Apdex score from histogram:
(
  sum(rate(http_request_duration_seconds_bucket{le="0.3"}[5m]))  # Satisfied (< 300ms)
+
  sum(rate(http_request_duration_seconds_bucket{le="1.2"}[5m]))  # Tolerating (< 1.2s)
) / 2 / sum(rate(http_request_duration_seconds_count[5m]))

predict_linear() & deriv()

# predict_linear() - linear regression to predict future values
# "Will disk fill up in the next 4 hours?"
predict_linear(node_filesystem_avail_bytes{mountpoint="/"}[6h], 4*3600) < 0
# Uses last 6h of data to predict value in 4h; alert if negative (full)

# "When will disk fill up?" (time until zero)
node_filesystem_avail_bytes / (- deriv(node_filesystem_avail_bytes[1h]) + 0.0001)
# → seconds until disk is full (based on last hour's trend)

# deriv() - per-second derivative of a gauge
deriv(process_resident_memory_bytes[1h])
# → rate of memory growth (bytes/second) - detect memory leaks!
# Positive = growing, Negative = shrinking, ~0 = stable

# delta() - difference between first and last sample in range
delta(process_resident_memory_bytes[1h])
# → total memory change over last hour

absent() & absent_over_time()

# absent() - returns 1 if NO series exist for the selector
# Critical for detecting missing targets or broken instrumentation
absent(up{job="payment-service"})
# → 1 if payment-service has no "up" metric (service not scraped!)

# absent_over_time() - returns 1 if no samples in range window
absent_over_time(http_requests_total{job="critical-api"}[5m])
# → 1 if critical-api had zero scrapes in last 5 minutes

# Common alerting patterns:
# Alert if a critical job disappears:
absent(up{job="payment-service"} == 1)
# → fires if payment-service is either down OR missing entirely

Subqueries

# Subqueries evaluate an instant query over a range, creating a range vector
# Syntax: <instant_query>[range:resolution]

# Maximum request rate over the last hour, sampled every minute:
max_over_time(rate(http_requests_total[5m])[1h:1m])
# Inner: rate(http_requests_total[5m]) evaluated at each 1m step over last 1h
# Outer: max across all those 60 evaluations

# Minimum number of healthy pods over last 24 hours:
min_over_time(count(up{job="api-server"} == 1)[24h:5m])
# → smallest fleet size in the last day (capacity planning)

# Detect sustained high latency (P99 > 500ms for more than 50% of last hour):
count_over_time((histogram_quantile(0.99, sum by (le) (rate(http_request_duration_seconds_bucket[5m]))) > 0.5)[1h:1m]) > 30
# → Alert only if P99 exceeded 500ms for >30 out of 60 minutes

Recording Rules

Rule Syntax & Best Practices

Recording rules pre-compute expensive queries and store results as new time series. They are essential for dashboard performance and creating aggregated metrics for alerting:

# recording-rules.yaml
# PrometheusRule CRD for Kubernetes
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: http-recording-rules
  namespace: monitoring
  labels:
    release: prometheus-stack
spec:
  groups:
    - name: http.rules
      interval: 30s  # Evaluation interval (default: global eval interval)
      rules:
        # Naming convention: level:metric:operations
        # level = aggregation level (job, namespace, cluster)
        # metric = original metric name
        # operations = what was done (rate5m, sum, etc.)

        - record: job:http_requests_total:rate5m
          expr: sum by (job) (rate(http_requests_total[5m]))
          labels:
            aggregation: "job_level"

        - record: job:http_request_duration_seconds:p99_5m
          expr: |
            histogram_quantile(0.99,
              sum by (job, le) (rate(http_request_duration_seconds_bucket[5m]))
            )

        - record: namespace:http_errors_total:rate5m
          expr: |
            sum by (namespace) (
              rate(http_requests_total{status=~"5.."}[5m])
            )

        - record: namespace:http_error_ratio:rate5m
          expr: |
            namespace:http_errors_total:rate5m
            /
            sum by (namespace) (rate(http_requests_total[5m]))

Production Rule Examples

# More production recording rules
spec:
  groups:
    - name: kubernetes.resource.rules
      rules:
        # CPU usage ratio per pod
        - record: pod:container_cpu_usage_seconds_total:rate5m
          expr: |
            sum by (namespace, pod) (
              rate(container_cpu_usage_seconds_total{container!="", image!=""}[5m])
            )

        # Memory usage per pod (excludes pause containers)
        - record: pod:container_memory_working_set_bytes:sum
          expr: |
            sum by (namespace, pod) (
              container_memory_working_set_bytes{container!="", image!=""}
            )

        # CPU request utilization (actual / requested)
        - record: pod:cpu_utilization_ratio:rate5m
          expr: |
            pod:container_cpu_usage_seconds_total:rate5m
            / on(namespace, pod) group_left()
            sum by (namespace, pod) (kube_pod_container_resource_requests{resource="cpu"})

Production Query Patterns

RED Method Queries

RED Method

Rate, Errors, Duration — For Request-Driven Services

Signal	PromQL Query	Dashboard Panel
Rate	`sum by (service) (rate(http_requests_total[5m]))`	Requests/sec graph
Errors	`sum by (service) (rate(http_requests_total{status=~"5.."}[5m])) / sum by (service) (rate(http_requests_total[5m]))`	Error % graph
Duration (P50)	`histogram_quantile(0.50, sum by (le, service) (rate(http_request_duration_seconds_bucket[5m])))`	Latency graph
Duration (P99)	`histogram_quantile(0.99, sum by (le, service) (rate(http_request_duration_seconds_bucket[5m])))`	Latency graph

RED MethodSREDashboards

USE Method Queries

# USE Method: Utilization, Saturation, Errors — For Resources

# CPU Utilization (per node)
1 - avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m]))

# CPU Saturation (runnable processes waiting for CPU)
node_load1 / count by (instance) (node_cpu_seconds_total{mode="idle"})

# Memory Utilization
1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)

# Memory Saturation (major page faults / OOM events)
rate(node_vmstat_pgmajfault[5m])

# Disk Utilization
1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})

# Disk Saturation (I/O wait time)
rate(node_disk_io_time_weighted_seconds_total[5m])

# Network Utilization (as % of link speed - requires knowing link_speed)
rate(node_network_transmit_bytes_total{device="eth0"}[5m]) * 8 / 1e9  # Gbps

# Network Errors
rate(node_network_receive_errs_total[5m]) + rate(node_network_transmit_errs_total[5m])

PromQL Gotchas & Anti-Patterns

                            
                            Top PromQL Anti-Patterns:
                            rate() on a gauge: rate(temperature_celsius[5m]) — rate is for counters only! Use deriv() for gauges
sum without le: histogram_quantile(0.99, sum(rate(x_bucket[5m]))) — you MUST keep le in the aggregation
rate with too-short window: rate(x[15s]) with 15s scrape — only 1 sample, useless!
Aggregating summaries: avg(go_gc_duration_seconds{quantile="0.99"}) — averaging quantiles is statistically meaningless
High-cardinality regex: {__name__=~".+"} — matches ALL metrics, extremely expensive
Unnecessary subqueries: max_over_time(x[1h:1m]) when max_over_time(x[1h]) works — subqueries should only be used when the inner expression is itself a function

                        

Conclusion & What’s Next

PromQL is a purpose-built language for time series analysis. Its key concepts:

Instant vectors for current state, range vectors for windowed analysis
rate() for counters, direct values for gauges, histogram_quantile() for percentiles
Aggregation with by/without controls dimensional grouping
group_left/group_right enables many-to-one enrichment joins
Recording rules pre-compute expensive queries for dashboard performance
RED method for services, USE method for resources

Next in the Series

In Part 5: Service Discovery, we’ll explore how Prometheus dynamically finds scrape targets — from Kubernetes SD roles and relabeling to building custom HTTP-based service discovery providers.

Previous Part 3: Data Model & TSDB Next Part 5: Service Discovery