Introducing Real User Monitoring
Throughout this series, we’ve focused on backend telemetry — metrics from Prometheus, logs from Loki, traces from Tempo. But all of that infrastructure exists to serve users. Real User Monitoring (RUM) closes the observability loop by capturing what users actually experience in their browsers: page load times, interaction responsiveness, visual stability, JavaScript errors, and network request performance.
Why RUM Matters
Consider this scenario: your backend SLOs are green, API latency is under 100ms, and all health checks pass. Yet users are complaining about a “slow” application. What’s happening?
- The CDN serving static assets has a cache miss spike in Southeast Asia
- A third-party analytics script is blocking the main thread for 3 seconds
- A CSS change caused layout shifts that make buttons jump around
- The client-side JavaScript bundle grew 40% in the last release
None of these issues are visible from backend metrics. RUM captures the last mile of user experience — everything between the user’s browser and your load balancer.
RUM vs Synthetic Monitoring
| Aspect | Real User Monitoring (RUM) | Synthetic Monitoring |
|---|---|---|
| Data Source | Actual user browsers | Scripted bots from known locations |
| Coverage | All users, all devices, all networks | Predefined paths only |
| Baseline | No — varies by user context | Yes — consistent conditions |
| Proactive Detection | No — requires real traffic | Yes — runs 24/7 even with no users |
| Third-Party Impact | Captures (ads, analytics, chat widgets) | May not load third-party scripts |
| Geographic Insight | Wherever users are | Only where probes are deployed |
| Cost Model | Proportional to traffic | Fixed (per probe × frequency) |
The ideal setup uses both: synthetic monitoring for baseline SLA validation and proactive regression detection; RUM for understanding the true distribution of user experience across diverse conditions.
Grafana Faro Overview
Grafana Faro is a lightweight JavaScript agent that collects frontend telemetry and sends it to a Grafana Cloud or self-hosted backend. It captures:
- Performance metrics — Web Vitals (LCP, INP, CLS), navigation timing, resource timing
- Errors — JavaScript exceptions with full stack traces
- Logs — Console messages (configurable levels)
- Traces — Browser-initiated spans that connect to backend traces
- Events — Custom user interactions and business events
- Sessions — User session tracking with page view sequences
flowchart LR
B["User Browser
Faro Agent"]
C["Grafana Alloy
(Faro Receiver)"]
L["Loki
(Logs + Errors)"]
T["Tempo
(Frontend Traces)"]
M["Mimir
(Web Vitals Metrics)"]
G["Grafana
(RUM Dashboards)"]
B -->|"HTTP POST"| C
C --> L
C --> T
C --> M
G --> L
G --> T
G --> M
Setting Up Frontend Observability
Installing Grafana Faro
Faro can be installed via npm for bundled applications or loaded from a CDN for traditional websites:
# Install via npm (React, Vue, Angular, Next.js, etc.)
npm install @grafana/faro-web-sdk @grafana/faro-web-tracing
// Initialize Faro in your application entry point
import { initializeFaro, getWebInstrumentations } from '@grafana/faro-web-sdk';
import { TracingInstrumentation } from '@grafana/faro-web-tracing';
const faro = initializeFaro({
url: 'https://faro-collector.your-domain.com/collect',
app: {
name: 'checkout-frontend',
version: '2.4.1',
environment: 'production',
},
instrumentations: [
...getWebInstrumentations({
captureConsole: true, // Capture console.error/warn
captureConsoleDisabledLevels: ['debug', 'trace'],
}),
new TracingInstrumentation({
instrumentationOptions: {
propagateTraceHeaderCorsUrls: [
/https:\/\/api\.your-domain\.com/, // Your API endpoints
/https:\/\/checkout\.your-domain\.com/,
],
},
}),
],
sessionTracking: {
enabled: true,
persistent: true, // Survive page reloads
samplingRate: 1.0, // 100% of sessions
},
});
For non-bundled sites (traditional HTML pages), use the CDN approach:
// CDN installation (add before closing </head>)
<script
src="https://unpkg.com/@grafana/faro-web-sdk@latest/dist/bundle/faro-web-sdk.iife.js"
></script>
<script>
window.GrafanaFaroWebSdk.initializeFaro({
url: 'https://faro-collector.your-domain.com/collect',
app: { name: 'marketing-site', version: '1.0.0', environment: 'production' },
instrumentations: window.GrafanaFaroWebSdk.getWebInstrumentations(),
});
</script>
Configuration Options
// Advanced Faro configuration
import { initializeFaro, getWebInstrumentations } from '@grafana/faro-web-sdk';
import { TracingInstrumentation } from '@grafana/faro-web-tracing';
const faro = initializeFaro({
url: 'https://faro-collector.your-domain.com/collect',
app: {
name: 'checkout-frontend',
version: '2.4.1',
environment: 'production',
namespace: 'checkout-team',
},
// Batching configuration for performance
batching: {
enabled: true,
sendTimeout: 250, // ms before sending batch
itemLimit: 50, // max items per batch
},
// Deduplicate identical errors
dedupe: {
enabled: true,
},
// Add custom attributes to all telemetry
beforeSend: (item) => {
// Strip PII from error messages
if (item.type === 'exception') {
item.value = item.value.replace(/email=\S+/g, 'email=[REDACTED]');
}
return item;
},
// Session tracking
sessionTracking: {
enabled: true,
persistent: true,
samplingRate: 1.0,
session: {
maxDuration: 4 * 60 * 60 * 1000, // 4 hours max session
idleTimeout: 15 * 60 * 1000, // 15 min idle = new session
},
},
// User context (set after authentication)
user: {
id: undefined, // Set later via faro.api.setUser()
},
instrumentations: [
...getWebInstrumentations({
captureConsole: true,
captureConsoleDisabledLevels: ['debug', 'trace', 'log'],
}),
new TracingInstrumentation({
instrumentationOptions: {
propagateTraceHeaderCorsUrls: [
/https:\/\/api\.your-domain\.com/,
],
},
}),
],
});
Framework Integration
Faro provides framework-specific integrations for better error boundary capture:
// React integration with error boundary
import { FaroErrorBoundary, FaroRoutes } from '@grafana/faro-react';
import { createRoutesFromChildren, matchRoutes, Routes, useLocation, useNavigationType } from 'react-router-dom';
// Initialize Faro with React-specific instrumentations
import { ReactIntegration } from '@grafana/faro-react';
const faro = initializeFaro({
url: 'https://faro-collector.your-domain.com/collect',
app: { name: 'react-app', version: '3.0.0', environment: 'production' },
instrumentations: [
...getWebInstrumentations(),
new TracingInstrumentation(),
new ReactIntegration({
router: {
version: 6,
dependencies: {
createRoutesFromChildren,
matchRoutes,
Routes,
useLocation,
useNavigationType,
},
},
}),
],
});
// Wrap your app with FaroErrorBoundary
function App() {
return (
<FaroErrorBoundary fallback={<ErrorPage />}>
<FaroRoutes>
{/* Your routes */}
</FaroRoutes>
</FaroErrorBoundary>
);
}
Exploring Core Web Vitals
Core Web Vitals are Google’s standardized metrics for measuring user experience. Faro captures all of them automatically:
Largest Contentful Paint (LCP)
LCP measures loading performance — how quickly the largest visible element (image, text block, or video) renders in the viewport.
| Rating | LCP Threshold | Impact |
|---|---|---|
| Good | ≤ 2.5 seconds | Users perceive the page as fast |
| Needs Improvement | 2.5 – 4.0 seconds | Users notice delay but may wait |
| Poor | > 4.0 seconds | High bounce rate, SEO penalty |
Common LCP optimization targets:
- Preload critical images and fonts with
<link rel="preload"> - Optimize server response time (TTFB < 800ms)
- Eliminate render-blocking JavaScript and CSS
- Use responsive images with
srcsetfor appropriate sizes
Interaction to Next Paint (INP)
INP measures interactivity — the delay between a user action (click, tap, keypress) and the next visual update. It replaced First Input Delay (FID) in March 2024 as a Core Web Vital.
| Rating | INP Threshold | User Perception |
|---|---|---|
| Good | ≤ 200 ms | Responsive, feels instant |
| Needs Improvement | 200 – 500 ms | Noticeable lag |
| Poor | > 500 ms | Feels broken or frozen |
requestIdleCallback(), Web Workers, or virtualized lists to keep the main thread responsive.
Cumulative Layout Shift (CLS)
CLS measures visual stability — how much visible content shifts unexpectedly during the page lifecycle. A CLS of 0 means nothing moved; 1.0 means the entire viewport shifted.
| Rating | CLS Threshold | Common Causes |
|---|---|---|
| Good | ≤ 0.1 | Fixed dimensions, font preloading |
| Needs Improvement | 0.1 – 0.25 | Late-loading ads, dynamic content |
| Poor | > 0.25 | Images without dimensions, font swap |
Other Performance Metrics
Beyond Core Web Vitals, Faro captures additional timing data from the Navigation Timing API:
- TTFB (Time to First Byte) — server response time including DNS, TCP, TLS
- FCP (First Contentful Paint) — first text/image rendered
- DOM Interactive — HTML parsed, ready for JavaScript
- DOM Complete — all resources (images, styles) loaded
- Resource Timing — individual fetch/XHR/image load durations
Pivoting from Frontend to Backend
Trace Context Propagation
The most powerful feature of RUM with Grafana is the ability to follow a user interaction from the browser click through your entire backend stack. This is achieved by propagating W3C Trace Context headers in fetch/XHR requests:
// Faro automatically injects trace headers when configured
// Your API requests will include:
// traceparent: 00-{trace-id}-{span-id}-01
// tracestate: (optional vendor-specific data)
// Example: User clicks "Place Order"
// 1. Faro creates a browser span: "user_click: place_order_button"
// 2. fetch() to /api/orders includes traceparent header
// 3. Backend creates child span: "POST /api/orders"
// 4. Backend calls payment service with same trace context
// 5. Complete trace visible in Tempo: browser → API → payment → database
// Manual span creation for complex interactions
import { faro } from '@grafana/faro-web-sdk';
async function placeOrder(cart) {
const span = faro.api.pushMeasurement({
type: 'custom',
values: { cart_items: cart.length, cart_total: cart.total },
});
try {
const response = await fetch('/api/orders', {
method: 'POST',
body: JSON.stringify(cart),
});
// Faro automatically tracks this fetch as a child span
return response.json();
} catch (error) {
faro.api.pushError(error);
throw error;
}
}
sequenceDiagram
participant U as User Browser
participant F as Faro Agent
participant A as API Gateway
participant O as Order Service
participant P as Payment Service
participant D as Database
U->>F: Click "Place Order"
F->>F: Create span (trace-id: abc123)
F->>A: POST /api/orders
traceparent: 00-abc123-span1-01
A->>O: Forward with trace context
O->>P: Charge card (child span)
P-->>O: Payment confirmed
O->>D: Insert order (child span)
D-->>O: Order saved
O-->>A: 201 Created
A-->>F: Response (with Server-Timing header)
F->>F: Complete browser span
Note over F: Full trace: Browser → API → Order → Payment → DB
Error Correlation
When a frontend error occurs, Faro captures the full context needed to correlate with backend logs:
// Faro captures unhandled errors automatically
// But you can add context for better correlation:
// Set user context after login
faro.api.setUser({
id: 'user-12345',
username: 'jane.doe',
attributes: {
plan: 'enterprise',
region: 'eu-west-1',
},
});
// Add page context
faro.api.setView({
name: 'checkout-page',
url: window.location.href,
});
// Custom error with context
try {
await processPayment(paymentDetails);
} catch (error) {
faro.api.pushError(error, {
context: {
orderId: order.id,
paymentMethod: paymentDetails.method,
amount: paymentDetails.amount,
},
});
}
In Grafana, you can then query Loki for frontend errors and jump to the correlated backend trace in Tempo:
# LogQL query for frontend JavaScript errors
{app="checkout-frontend"} | json | level="error"
| line_format "{{.message}}"
| label_format traceId="{{.traceID}}"
# Find the correlated backend trace in Tempo
# Use the traceId from the Faro error log to search Tempo
Session Context
Faro’s session tracking groups all telemetry from a single user visit, enabling you to reconstruct the sequence of events leading to an error:
# Query all events from a problematic session in Loki
{app="checkout-frontend"} | json
| session_id="abc-123-def-456"
| sort by timestamp
# Results show the user's journey:
# 10:01:00 - page_view: /products
# 10:01:15 - page_view: /cart
# 10:01:30 - click: add_to_cart
# 10:01:45 - page_view: /checkout
# 10:02:01 - error: "TypeError: Cannot read property 'address' of undefined"
# 10:02:01 - Associated trace: xyz-789 (payment API returned 500)
Enhancements & Custom Configuration
Custom Events & Measurements
Track business-specific interactions beyond automatic instrumentation:
// Track custom business events
import { faro } from '@grafana/faro-web-sdk';
// E-commerce: Track add-to-cart
function addToCart(product) {
faro.api.pushEvent('add_to_cart', {
product_id: product.id,
product_name: product.name,
price: product.price,
category: product.category,
});
}
// SaaS: Track feature usage
function openFeature(featureName) {
faro.api.pushEvent('feature_used', {
feature: featureName,
plan: currentUser.plan,
usage_count: getUsageCount(featureName),
});
}
// Performance: Track custom timing
function measureSearchLatency(query) {
const start = performance.now();
return fetch(`/api/search?q=${query}`)
.then(response => response.json())
.then(results => {
const duration = performance.now() - start;
faro.api.pushMeasurement({
type: 'search_latency',
values: {
duration_ms: duration,
result_count: results.length,
query_length: query.length,
},
});
return results;
});
}
User Context & Metadata
// Enrich telemetry with user and environment context
import { faro } from '@grafana/faro-web-sdk';
// After user authenticates
function onLogin(user) {
faro.api.setUser({
id: user.id,
username: user.email,
attributes: {
plan: user.subscription.plan,
company: user.company.name,
role: user.role,
},
});
}
// Track A/B test variants
function setExperimentContext(experiments) {
faro.api.pushEvent('experiments_loaded', {
variants: JSON.stringify(experiments),
// e.g., { "checkout_redesign": "variant_b", "new_search": "control" }
});
}
// Set page-specific metadata
function onRouteChange(route) {
faro.api.setView({
name: route.name,
url: window.location.href,
attributes: {
page_type: route.meta.type, // 'product', 'checkout', 'content'
requires_auth: route.meta.auth,
},
});
}
Sampling & Privacy
For high-traffic sites, implement sampling to control costs while maintaining statistical significance:
// Sampling configuration
const faro = initializeFaro({
url: 'https://faro-collector.your-domain.com/collect',
app: { name: 'high-traffic-site', version: '5.0.0', environment: 'production' },
sessionTracking: {
enabled: true,
samplingRate: 0.1, // Only 10% of sessions send telemetry
},
// Privacy: Strip sensitive data before sending
beforeSend: (item) => {
// Remove credit card numbers from error messages
if (item.payload && typeof item.payload === 'string') {
item.payload = item.payload.replace(/\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b/g, '[CARD_REDACTED]');
}
// Strip email addresses
if (item.payload && typeof item.payload === 'string') {
item.payload = item.payload.replace(/[\w.-]+@[\w.-]+\.\w+/g, '[EMAIL_REDACTED]');
}
// Don't send telemetry from internal/test users
if (faro.api.getUser()?.attributes?.internal === 'true') {
return null; // Drop this item
}
return item;
},
instrumentations: [
...getWebInstrumentations({
captureConsole: false, // Disable console capture for privacy
}),
],
});
beforeSend hook to strip PII, honor Do Not Track headers, and implement consent-based activation in regions requiring explicit opt-in (EU, California).
Building RUM Dashboards
Overview Dashboard
A RUM overview dashboard should answer: “How are users experiencing our application right now?”
RUM Overview Dashboard Panels
| Panel | Visualization | Query Source |
|---|---|---|
| Active Sessions (live) | Stat | Mimir: sum(faro_sessions_active) |
| Web Vitals Summary | Gauge (LCP, INP, CLS) | Mimir: histogram_quantile(0.75, ...) |
| Error Rate | Time Series | Mimir: rate(faro_errors_total[5m]) |
| Page Load Distribution | Heatmap | Mimir: faro_page_load_duration_bucket |
| Top Errors | Table | Loki: {app="frontend"} | json | level="error" |
| Slowest Pages | Bar Chart | Mimir: p75 LCP grouped by page |
| Geographic Performance | Geomap | Mimir: LCP by country label |
| Browser/OS Breakdown | Pie Chart | Mimir: sessions by user agent |
Page Performance Dashboard
Drill down into individual page performance with percentile breakdowns:
# PromQL queries for page performance dashboard
# LCP p75 by page (should be < 2.5s)
histogram_quantile(0.75,
sum by (le, page_name) (
rate(faro_web_vitals_lcp_bucket{app="checkout-frontend"}[5m])
)
)
# INP p75 by page (should be < 200ms)
histogram_quantile(0.75,
sum by (le, page_name) (
rate(faro_web_vitals_inp_bucket{app="checkout-frontend"}[5m])
)
)
# Percentage of page loads meeting "Good" thresholds
sum(rate(faro_web_vitals_lcp_bucket{le="2.5", app="checkout-frontend"}[1h]))
/
sum(rate(faro_web_vitals_lcp_count{app="checkout-frontend"}[1h]))
* 100
# Resource loading waterfall (top 10 slowest resources)
topk(10,
avg by (resource_url) (
faro_resource_duration_seconds{app="checkout-frontend"}
)
)
Error Tracking Dashboard
# LogQL queries for error tracking
# Error count by type (grouped)
sum by (error_type) (
count_over_time(
{app="checkout-frontend"} | json | level="error" [5m]
)
)
# New errors (first seen in last 24h)
{app="checkout-frontend"} | json | level="error"
| line_format "{{.error_type}}: {{.message}}"
# Cross-reference with historical data to identify new vs recurring
# Errors with session context for debugging
{app="checkout-frontend"} | json | level="error"
| line_format "Session: {{.session_id}} | User: {{.user_id}} | Page: {{.page}} | Error: {{.message}}"
# Error rate as percentage of page views
sum(rate(
{app="checkout-frontend"} | json | level="error" | unwrap duration [5m]
))
/
sum(rate(
{app="checkout-frontend"} | json | type="page_view" | unwrap duration [5m]
)) * 100
Summary & Next Steps
Real User Monitoring with Grafana Faro completes the observability picture by connecting user experience to infrastructure health:
- Grafana Faro — lightweight browser agent capturing Web Vitals, errors, logs, traces, and custom events
- Core Web Vitals — LCP (≤2.5s), INP (≤200ms), CLS (≤0.1) for SEO and UX benchmarking
- Frontend-to-backend correlation — W3C Trace Context propagation connects browser spans to backend traces in Tempo
- Session tracking — reconstruct user journeys leading to errors for faster debugging
- Privacy by design — sampling, PII stripping, and consent-based activation for compliance
- RUM dashboards — overview, page performance, and error tracking views in Grafana
Next in the Series
In Part 13: Application Performance with Pyroscope & k6, we’ll explore continuous profiling with Grafana Pyroscope to identify CPU and memory hotspots, and load testing with k6 to validate performance under stress — proactive performance engineering before issues reach production.