Back to Monitoring, Observability & Reliability Series

Part 2: Metrics Fundamentals & the Four Golden Signals

May 14, 2026 Wasil Zafar 20 min read

Metrics are the heartbeat of your monitoring system — numerical measurements collected over time that quantify system behaviour. In this part, you will master the four metric types, learn why percentiles matter more than averages, and apply Google SRE's Four Golden Signals to any service you operate.

Table of Contents

  1. What Is a Metric?
  2. The Four Metric Types
  3. Percentiles vs Averages
  4. The Four Golden Signals
  5. The RED Method
  6. Infrastructure Monitoring
  7. Conclusion & Next Steps

What Is a Metric?

A metric is a numeric representation of system state measured over time. Unlike logs (which record discrete events) or traces (which record request journeys), metrics are aggregations — they summarise many events into a single number at a given point in time.

Consider the statement "1,247 HTTP requests per second." This is a metric. It does not tell you what any individual request contained, or which user made it, or what the response was. It tells you something quantitative about the overall system state at that moment.

Anatomy of a Metric

In modern monitoring systems (especially Prometheus-style), a metric has three components:

Component Description Example
Name Identifies what is being measured http_requests_total
Labels Key-value pairs that add dimensions method="GET", status="200", path="/api/users"
Value The numeric measurement 12845.0

A complete metric data point also includes a timestamp. Together, a stream of (timestamp, value) pairs for a named metric with given labels forms a time series.

# Example Prometheus metric exposition format
# HELP http_requests_total The total number of HTTP requests.
# TYPE http_requests_total counter
http_requests_total{method="GET",status="200",path="/api/users"} 12845
http_requests_total{method="POST",status="201",path="/api/users"} 384
http_requests_total{method="GET",status="404",path="/api/items"} 27
http_requests_total{method="GET",status="500",path="/api/orders"} 3

Labels and the Cardinality Trap

Labels are powerful — they let you slice and dice your metrics. Instead of one flat "request count" number, you have a multi-dimensional view: count by method, by status code, by endpoint, by region, by customer tier.

The Cardinality Trap: Every unique combination of label values creates a separate time series in your metrics database. If you add a label with high cardinality — like user_id (millions of unique values) or request_id (billions) — you will create millions or billions of time series. This is called a cardinality explosion, and it is one of the most common production issues with Prometheus. It can crash your monitoring system. Never use high-cardinality values as metric labels.

Good label candidates (low cardinality):

  • method (GET, POST, PUT, DELETE — 4-6 values)
  • status_code (200, 201, 400, 404, 500 — ~10 values)
  • region (us-east-1, eu-west-1 — <20 values)
  • environment (prod, staging, dev — 3 values)

Bad label candidates (high cardinality):

  • user_id, customer_id — millions of values
  • request_id, trace_id — unique per request
  • url with query strings — unbounded
  • error_message — unbounded free text

The Four Metric Types

Prometheus defines four core metric types. Understanding them deeply is essential — choosing the wrong type leads to incorrect queries and misleading dashboards.

Counters — Always Going Up

A counter is a metric that only increases. It represents a cumulative count of events. Counters reset to zero only when the process restarts.

Examples:

  • Total HTTP requests served since startup
  • Total bytes sent or received
  • Total errors encountered
  • Total database queries executed
Querying Counters: Raw counter values are rarely useful — what you want is the rate of change. In PromQL: rate(http_requests_total[5m]) gives you requests per second averaged over the last 5 minutes. In NRQL: use derivative() or rate() functions.
# Prometheus: requests per second over 5 minutes
rate(http_requests_total{job="api"}[5m])

# Prometheus: error rate as a percentage
sum(rate(http_requests_total{status=~"5.."}[5m]))
  /
sum(rate(http_requests_total[5m]))
  * 100

Gauges — Snapshots of Current State

A gauge is a metric that can go up or down. It represents the current value of something at a given moment — like a snapshot of system state.

Examples:

  • Current memory usage in bytes
  • Number of active connections
  • Current queue depth
  • CPU utilisation percentage
  • Current number of running goroutines or threads
# Example: Alert when memory usage exceeds 85%
# In Prometheus alerting rule:
# node_memory_Active_bytes / node_memory_MemTotal_bytes * 100 > 85

# Example Prometheus metric
node_memory_Active_bytes 2147483648
node_memory_MemTotal_bytes 8589934592
# Usage: 2147483648 / 8589934592 * 100 = 25%

Histograms — Distributions and Percentiles

A histogram samples observations (typically request durations or response sizes) and counts them in configurable buckets. It enables calculation of approximate percentiles on the server side.

A Prometheus histogram with name http_request_duration_seconds actually creates three time series:

  • http_request_duration_seconds_bucket{le="0.1"} — count of requests completing in ≤ 0.1s
  • http_request_duration_seconds_bucket{le="0.5"} — count of requests completing in ≤ 0.5s
  • http_request_duration_seconds_bucket{le="+Inf"} — total count (same as sum below)
  • http_request_duration_seconds_sum — sum of all observation values
  • http_request_duration_seconds_count — total number of observations
# PromQL: Calculate 95th percentile latency
histogram_quantile(0.95,
  sum(rate(http_request_duration_seconds_bucket[5m])) by (le)
)

# PromQL: Average request duration
http_request_duration_seconds_sum / http_request_duration_seconds_count
Choosing Histogram Buckets: Buckets should be chosen based on your SLO targets. If your latency SLO is p99 < 500ms, include buckets at 100ms, 250ms, 500ms, 1000ms, 2500ms, 5000ms. Without a bucket boundary near your SLO target, you cannot measure compliance accurately.

Summaries — Client-Side Percentiles

Summaries are similar to histograms but calculate percentiles on the client side (in the instrumented application) rather than the server side. This makes them more accurate (exact rather than approximate) but less flexible — you cannot aggregate percentiles from multiple instances of a service.

Histograms vs Summaries: In most modern observability setups, prefer histograms. They can be aggregated across multiple service instances and allow percentile calculation at query time with any quantile value. Summaries require you to specify quantiles at instrumentation time and cannot be meaningfully aggregated.

Percentiles vs Averages — Why the Mean Misleads You

This is one of the most important concepts in performance monitoring. The arithmetic mean of response times almost always tells you an incomplete — often dangerously misleading — story.

Why Averages Lie

Mathematical Example

The Hidden Tail: When Average = 50ms Means Users Are Suffering

Imagine a service that handles 1,000 requests per minute. In one minute:

  • 990 requests complete in 20ms
  • 10 requests complete in 3,000ms (3 seconds)

Average response time = (990 × 20 + 10 × 3000) / 1000 = (19,800 + 30,000) / 1000 = 49.8ms

Your average latency dashboard shows ~50ms. Everything looks fine. But 10 users per minute (1% of traffic) are waiting 3 full seconds for a response. If this is a checkout flow, that is 10 frustrated, potentially abandoning customers every minute.

The p99 latency in this scenario is 3,000ms. This is the signal your SLO should be tracking, not the mean.

Tail Latency SLO Design User Experience

p50, p95, p99, p99.9 Explained

Percentiles (or quantiles) answer: "What is the maximum response time for X% of requests?"

Percentile Meaning Typical Use
p50 (median) Half of requests are faster than this Typical user experience baseline
p95 95% of requests are faster than this; 5% are slower Common SLO target for non-critical APIs
p99 99% of requests are faster than this; 1% are slower Common SLO target for user-facing APIs
p99.9 99.9% of requests are faster; 0.1% are slower SLO target for critical payment/auth flows
Rule of Thumb: Monitor p99 for user-facing services. At 100 requests/second, your p99 represents the slowest request in every 100. At 10,000 requests/second, 100 users per second experience that p99 latency. At scale, tail latency matters enormously.

The Four Golden Signals

In the Google SRE Book (one of the foundational texts of reliability engineering), the team describes the "Four Golden Signals" — the minimum viable set of metrics that give you meaningful visibility into any service's health. If you can only instrument four things, instrument these.

The Four Golden Signals
                                flowchart LR
                                    A[Service Health] --> B[Latency\nHow long?]
                                    A --> C[Traffic\nHow much?]
                                    A --> D[Errors\nHow many failing?]
                                    A --> E[Saturation\nHow full?]
                                    style B fill:#3B9797,color:#fff
                                    style C fill:#16476A,color:#fff
                                    style D fill:#BF092F,color:#fff
                                    style E fill:#132440,color:#fff
                            

Signal 1: Latency

Latency measures how long it takes to serve a request. It directly correlates with user experience — slow responses frustrate users, cause SLO violations, and can cascade into broader system failures.

Critical Latency Insight: Distinguish between successful request latency and failed request latency. A request that fails instantly (in 1ms with a 500 error) is very different from a request that times out (in 30 seconds with a 504). Tracking error latency separately helps diagnose whether errors are fast-failing or slow-timing-out — the latter is far more damaging to system health.
# PromQL: p99 latency for successful requests only
histogram_quantile(0.99,
  sum(rate(http_request_duration_seconds_bucket{status!~"5.."}[5m])) by (le)
)

# PromQL: Compare successful vs error latency
histogram_quantile(0.99,
  sum(rate(http_request_duration_seconds_bucket{status=~"5.."}[5m])) by (le)
)

Signal 2: Traffic

Traffic measures how much demand is being placed on the system. For web services this is usually requests per second; for streaming services it might be bytes per second; for databases it could be queries per second or transactions per second.

Traffic metrics are essential for:

  • Capacity planning — understanding growth trends and peak loads
  • Anomaly detection — sudden traffic drops can indicate upstream failures; sudden spikes can indicate attacks or viral events
  • Contextualising other signals — a 5% error rate at 10 RPS is very different from 5% at 10,000 RPS
# PromQL: Requests per second over 5-minute window
sum(rate(http_requests_total[5m]))

# PromQL: Traffic by endpoint and method
sum(rate(http_requests_total[5m])) by (method, path)

Signal 3: Errors

Errors measure the rate at which requests are failing. This includes explicit failures (HTTP 5xx responses, exceptions) and implicit failures (HTTP 200 responses that return wrong data, requests that complete but exceed latency SLOs).

Explicit vs Implicit Errors: HTTP 500 errors are explicit failures — the service knows the request failed. But a service can return HTTP 200 with stale, incorrect, or incomplete data. These "silent errors" are harder to detect but just as damaging. Instrument your application logic to emit metrics for business-level failures, not just HTTP status codes.
# PromQL: Error rate as percentage of total traffic
100 * sum(rate(http_requests_total{status=~"5.."}[5m]))
  /
sum(rate(http_requests_total[5m]))

# PromQL: Alert when error rate exceeds 1% for 5 minutes
# (Used in Prometheus alerting rules)
# expr: > 1
# for: 5m

Signal 4: Saturation

Saturation measures how "full" the service is — how close to its resource limits it is operating. A service at 100% CPU saturation cannot handle more traffic. A connection pool at capacity will start queueing or rejecting requests. A disk at 100% utilisation will cause writes to fail.

Saturation metrics are leading indicators — a saturating resource predicts future failures before they occur. This makes saturation the most proactive of the four signals.

Resource Saturation Metric Warning Threshold
CPU CPU utilisation % > 80% sustained
Memory Memory utilisation % or swap usage > 85% or any swap
Disk Disk space utilisation %, IOPS utilisation > 80% space, > 70% IOPS
Network Bandwidth utilisation %, packet loss > 60% bandwidth, any packet loss
DB Connections Connection pool utilisation % > 75%
Thread Pool Active threads / max threads > 80%
# PromQL: CPU saturation
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# PromQL: Memory saturation
100 * (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)

# PromQL: Disk space saturation
100 * (1 - node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})

The RED Method

Coined by Tom Wilkie at Grafana Labs, the RED method is a simplified framework for monitoring microservices that focuses on the user-observable behaviour of each service. RED stands for:

  • Rate — how many requests per second is the service handling?
  • Errors — how many of those requests are failing?
  • Duration — how long does each request take?

RED is essentially a service-level subset of the Four Golden Signals (omitting Saturation, which is more infrastructure-focused). It maps perfectly to how users experience a service: they care about throughput (Rate), reliability (Errors), and speed (Duration).

RED vs USE vs Golden Signals: Use RED for service-level monitoring (each microservice endpoint). Use USE (Utilisation, Saturation, Errors) for resource-level monitoring (CPU, memory, disk). Use the Four Golden Signals when you want a unified framework that covers both.
Hands-On Exercise

Applying RED to a Real Service

Take any API you operate (or think about a hypothetical checkout service) and answer:

  • Rate: How many requests per second does it handle at peak? At off-peak? What does a 10x spike look like?
  • Errors: What HTTP status codes does it return? What percentage are 4xx (client errors)? 5xx (server errors)? What non-HTTP errors can occur (timeouts, circuit breaks, partial failures)?
  • Duration: What is the p50, p95, p99 latency? What is your SLO target? Are any endpoints systematically slower?

Just answering these questions for every service you operate puts you ahead of most production teams. Instrument the answers, and you have a solid operational baseline.

RED Method Service Monitoring SLO Design

Infrastructure Monitoring

Beyond application metrics, you need visibility into the infrastructure your services run on. Compute, storage, and network metrics form the foundation of your operational picture.

Compute Metrics

Key metrics to monitor for every server or container host:

  • CPU utilisation — total % across all cores; distinguish user vs system vs iowait
  • CPU load average — 1/5/15-minute averages; load > number of cores indicates saturation
  • Memory utilisation — used, cached, available; watch for OOM (out-of-memory) pressure
  • Process/thread count — can indicate resource leaks

Storage Metrics

  • Disk space — utilisation % per filesystem; alert at 80%, page at 90%
  • Disk IOPS — reads and writes per second; compare against disk's rated IOPS capacity
  • Disk throughput — bytes read/written per second
  • Disk I/O queue depth — requests waiting for the disk; high queue depth indicates I/O saturation
  • Disk latency — average I/O completion time; HDD: <10ms, SSD: <1ms, NVMe: <0.1ms

Network Metrics

  • Packet loss — any packet loss indicates network problems; even 0.1% loss can devastate TCP performance
  • Bandwidth utilisation — inbound and outbound bytes per second vs link capacity
  • Round-trip time (RTT) — latency between nodes; spikes indicate congestion or routing problems
  • Connection counts — established, TIME_WAIT, CLOSE_WAIT; high TIME_WAIT count can indicate connection handling issues
  • Network errors — dropped packets, retransmits, checksum errors

Conclusion & Next Steps

You now have a solid grounding in metrics — the quantitative backbone of any monitoring system. The key insights from Part 2:

  • Four metric types: counters (always increasing), gauges (current state), histograms (distributions, server-side percentiles), summaries (exact percentiles, client-side)
  • Cardinality matters: Never use high-cardinality values as metric labels — it will crash your monitoring system
  • Percentiles, not averages: p99 latency is the signal that reflects your worst user experiences; averages hide tail latency
  • Four Golden Signals: Latency, Traffic, Errors, Saturation — instrument these for every service
  • RED method: Rate, Errors, Duration — a service-focused framework perfect for microservices monitoring