Grafana Deep Dive Part 12: Real User Monitoring with Grafana

Introducing Real User Monitoring

Throughout this series, we’ve focused on backend telemetry — metrics from Prometheus, logs from Loki, traces from Tempo. But all of that infrastructure exists to serve users. Real User Monitoring (RUM) closes the observability loop by capturing what users actually experience in their browsers: page load times, interaction responsiveness, visual stability, JavaScript errors, and network request performance.

Why RUM Matters

Consider this scenario: your backend SLOs are green, API latency is under 100ms, and all health checks pass. Yet users are complaining about a “slow” application. What’s happening?

The CDN serving static assets has a cache miss spike in Southeast Asia
A third-party analytics script is blocking the main thread for 3 seconds
A CSS change caused layout shifts that make buttons jump around
The client-side JavaScript bundle grew 40% in the last release

None of these issues are visible from backend metrics. RUM captures the last mile of user experience — everything between the user’s browser and your load balancer.

                            
                            Key Insight: Google uses Core Web Vitals as a ranking signal. Sites with poor LCP, INP, or CLS are penalized in search results. RUM isn’t just operational hygiene — it directly impacts SEO, conversion rates, and revenue. Amazon found that every 100ms of added latency costs 1% of sales.
                        

RUM vs Synthetic Monitoring

Aspect	Real User Monitoring (RUM)	Synthetic Monitoring
Data Source	Actual user browsers	Scripted bots from known locations
Coverage	All users, all devices, all networks	Predefined paths only
Baseline	No — varies by user context	Yes — consistent conditions
Proactive Detection	No — requires real traffic	Yes — runs 24/7 even with no users
Third-Party Impact	Captures (ads, analytics, chat widgets)	May not load third-party scripts
Geographic Insight	Wherever users are	Only where probes are deployed
Cost Model	Proportional to traffic	Fixed (per probe × frequency)

The ideal setup uses both: synthetic monitoring for baseline SLA validation and proactive regression detection; RUM for understanding the true distribution of user experience across diverse conditions.

Grafana Faro Overview

Grafana Faro is a lightweight JavaScript agent that collects frontend telemetry and sends it to a Grafana Cloud or self-hosted backend. It captures:

Performance metrics — Web Vitals (LCP, INP, CLS), navigation timing, resource timing
Errors — JavaScript exceptions with full stack traces
Logs — Console messages (configurable levels)
Traces — Browser-initiated spans that connect to backend traces
Events — Custom user interactions and business events
Sessions — User session tracking with page view sequences

Grafana Faro Data Flow

flowchart LR
    B["User Browser
Faro Agent"]
    C["Grafana Alloy
(Faro Receiver)"]
    L["Loki
(Logs + Errors)"]
    T["Tempo
(Frontend Traces)"]
    M["Mimir
(Web Vitals Metrics)"]
    G["Grafana
(RUM Dashboards)"]
    B -->|"HTTP POST"| C
    C --> L
    C --> T
    C --> M
    G --> L
    G --> T
    G --> M

Setting Up Frontend Observability

Installing Grafana Faro

Faro can be installed via npm for bundled applications or loaded from a CDN for traditional websites:

# Install via npm (React, Vue, Angular, Next.js, etc.)
npm install @grafana/faro-web-sdk @grafana/faro-web-tracing

// Initialize Faro in your application entry point
import { initializeFaro, getWebInstrumentations } from '@grafana/faro-web-sdk';
import { TracingInstrumentation } from '@grafana/faro-web-tracing';

const faro = initializeFaro({
  url: 'https://faro-collector.your-domain.com/collect',
  app: {
    name: 'checkout-frontend',
    version: '2.4.1',
    environment: 'production',
  },
  instrumentations: [
    ...getWebInstrumentations({
      captureConsole: true,           // Capture console.error/warn
      captureConsoleDisabledLevels: ['debug', 'trace'],
    }),
    new TracingInstrumentation({
      instrumentationOptions: {
        propagateTraceHeaderCorsUrls: [
          /https:\/\/api\.your-domain\.com/,    // Your API endpoints
          /https:\/\/checkout\.your-domain\.com/,
        ],
      },
    }),
  ],
  sessionTracking: {
    enabled: true,
    persistent: true,               // Survive page reloads
    samplingRate: 1.0,              // 100% of sessions
  },
});

For non-bundled sites (traditional HTML pages), use the CDN approach:

// CDN installation (add before closing </head>)
<script
  src="https://unpkg.com/@grafana/faro-web-sdk@latest/dist/bundle/faro-web-sdk.iife.js"
></script>
<script>
  window.GrafanaFaroWebSdk.initializeFaro({
    url: 'https://faro-collector.your-domain.com/collect',
    app: { name: 'marketing-site', version: '1.0.0', environment: 'production' },
    instrumentations: window.GrafanaFaroWebSdk.getWebInstrumentations(),
  });
</script>

Configuration Options

// Advanced Faro configuration
import { initializeFaro, getWebInstrumentations } from '@grafana/faro-web-sdk';
import { TracingInstrumentation } from '@grafana/faro-web-tracing';

const faro = initializeFaro({
  url: 'https://faro-collector.your-domain.com/collect',

  app: {
    name: 'checkout-frontend',
    version: '2.4.1',
    environment: 'production',
    namespace: 'checkout-team',
  },

  // Batching configuration for performance
  batching: {
    enabled: true,
    sendTimeout: 250,            // ms before sending batch
    itemLimit: 50,               // max items per batch
  },

  // Deduplicate identical errors
  dedupe: {
    enabled: true,
  },

  // Add custom attributes to all telemetry
  beforeSend: (item) => {
    // Strip PII from error messages
    if (item.type === 'exception') {
      item.value = item.value.replace(/email=\S+/g, 'email=[REDACTED]');
    }
    return item;
  },

  // Session tracking
  sessionTracking: {
    enabled: true,
    persistent: true,
    samplingRate: 1.0,
    session: {
      maxDuration: 4 * 60 * 60 * 1000,  // 4 hours max session
      idleTimeout: 15 * 60 * 1000,       // 15 min idle = new session
    },
  },

  // User context (set after authentication)
  user: {
    id: undefined,  // Set later via faro.api.setUser()
  },

  instrumentations: [
    ...getWebInstrumentations({
      captureConsole: true,
      captureConsoleDisabledLevels: ['debug', 'trace', 'log'],
    }),
    new TracingInstrumentation({
      instrumentationOptions: {
        propagateTraceHeaderCorsUrls: [
          /https:\/\/api\.your-domain\.com/,
        ],
      },
    }),
  ],
});

Framework Integration

Faro provides framework-specific integrations for better error boundary capture:

// React integration with error boundary
import { FaroErrorBoundary, FaroRoutes } from '@grafana/faro-react';
import { createRoutesFromChildren, matchRoutes, Routes, useLocation, useNavigationType } from 'react-router-dom';

// Initialize Faro with React-specific instrumentations
import { ReactIntegration } from '@grafana/faro-react';

const faro = initializeFaro({
  url: 'https://faro-collector.your-domain.com/collect',
  app: { name: 'react-app', version: '3.0.0', environment: 'production' },
  instrumentations: [
    ...getWebInstrumentations(),
    new TracingInstrumentation(),
    new ReactIntegration({
      router: {
        version: 6,
        dependencies: {
          createRoutesFromChildren,
          matchRoutes,
          Routes,
          useLocation,
          useNavigationType,
        },
      },
    }),
  ],
});

// Wrap your app with FaroErrorBoundary
function App() {
  return (
    <FaroErrorBoundary fallback={<ErrorPage />}>
      <FaroRoutes>
        {/* Your routes */}
      </FaroRoutes>
    </FaroErrorBoundary>
  );
}

Exploring Core Web Vitals

Core Web Vitals are Google’s standardized metrics for measuring user experience. Faro captures all of them automatically:

Largest Contentful Paint (LCP)

LCP measures loading performance — how quickly the largest visible element (image, text block, or video) renders in the viewport.

Rating	LCP Threshold	Impact
Good	≤ 2.5 seconds	Users perceive the page as fast
Needs Improvement	2.5 – 4.0 seconds	Users notice delay but may wait
Poor	> 4.0 seconds	High bounce rate, SEO penalty

Common LCP optimization targets:

Preload critical images and fonts with <link rel="preload">
Optimize server response time (TTFB < 800ms)
Eliminate render-blocking JavaScript and CSS
Use responsive images with srcset for appropriate sizes

Interaction to Next Paint (INP)

INP measures interactivity — the delay between a user action (click, tap, keypress) and the next visual update. It replaced First Input Delay (FID) in March 2024 as a Core Web Vital.

Rating	INP Threshold	User Perception
Good	≤ 200 ms	Responsive, feels instant
Needs Improvement	200 – 500 ms	Noticeable lag
Poor	> 500 ms	Feels broken or frozen

                            
                            Common INP Killer: Long-running JavaScript on the main thread. A synchronous JSON parse of a 5MB API response, a complex React re-render, or an unthrottled scroll event handler can all block the main thread and spike INP. Use requestIdleCallback(), Web Workers, or virtualized lists to keep the main thread responsive.
                        

Cumulative Layout Shift (CLS)

CLS measures visual stability — how much visible content shifts unexpectedly during the page lifecycle. A CLS of 0 means nothing moved; 1.0 means the entire viewport shifted.

Rating	CLS Threshold	Common Causes
Good	≤ 0.1	Fixed dimensions, font preloading
Needs Improvement	0.1 – 0.25	Late-loading ads, dynamic content
Poor	> 0.25	Images without dimensions, font swap

Other Performance Metrics

Beyond Core Web Vitals, Faro captures additional timing data from the Navigation Timing API:

TTFB (Time to First Byte) — server response time including DNS, TCP, TLS
FCP (First Contentful Paint) — first text/image rendered
DOM Interactive — HTML parsed, ready for JavaScript
DOM Complete — all resources (images, styles) loaded
Resource Timing — individual fetch/XHR/image load durations

Pivoting from Frontend to Backend

Trace Context Propagation

The most powerful feature of RUM with Grafana is the ability to follow a user interaction from the browser click through your entire backend stack. This is achieved by propagating W3C Trace Context headers in fetch/XHR requests:

// Faro automatically injects trace headers when configured
// Your API requests will include:
// traceparent: 00-{trace-id}-{span-id}-01
// tracestate: (optional vendor-specific data)

// Example: User clicks "Place Order"
// 1. Faro creates a browser span: "user_click: place_order_button"
// 2. fetch() to /api/orders includes traceparent header
// 3. Backend creates child span: "POST /api/orders"
// 4. Backend calls payment service with same trace context
// 5. Complete trace visible in Tempo: browser → API → payment → database

// Manual span creation for complex interactions
import { faro } from '@grafana/faro-web-sdk';

async function placeOrder(cart) {
  const span = faro.api.pushMeasurement({
    type: 'custom',
    values: { cart_items: cart.length, cart_total: cart.total },
  });

  try {
    const response = await fetch('/api/orders', {
      method: 'POST',
      body: JSON.stringify(cart),
    });
    // Faro automatically tracks this fetch as a child span
    return response.json();
  } catch (error) {
    faro.api.pushError(error);
    throw error;
  }
}

End-to-End Trace: Browser to Backend

sequenceDiagram
    participant U as User Browser
    participant F as Faro Agent
    participant A as API Gateway
    participant O as Order Service
    participant P as Payment Service
    participant D as Database
    U->>F: Click "Place Order"
    F->>F: Create span (trace-id: abc123)
    F->>A: POST /api/orders
traceparent: 00-abc123-span1-01
    A->>O: Forward with trace context
    O->>P: Charge card (child span)
    P-->>O: Payment confirmed
    O->>D: Insert order (child span)
    D-->>O: Order saved
    O-->>A: 201 Created
    A-->>F: Response (with Server-Timing header)
    F->>F: Complete browser span
    Note over F: Full trace: Browser → API → Order → Payment → DB

Error Correlation

When a frontend error occurs, Faro captures the full context needed to correlate with backend logs:

// Faro captures unhandled errors automatically
// But you can add context for better correlation:

// Set user context after login
faro.api.setUser({
  id: 'user-12345',
  username: 'jane.doe',
  attributes: {
    plan: 'enterprise',
    region: 'eu-west-1',
  },
});

// Add page context
faro.api.setView({
  name: 'checkout-page',
  url: window.location.href,
});

// Custom error with context
try {
  await processPayment(paymentDetails);
} catch (error) {
  faro.api.pushError(error, {
    context: {
      orderId: order.id,
      paymentMethod: paymentDetails.method,
      amount: paymentDetails.amount,
    },
  });
}

In Grafana, you can then query Loki for frontend errors and jump to the correlated backend trace in Tempo:

# LogQL query for frontend JavaScript errors
{app="checkout-frontend"} | json | level="error"
  | line_format "{{.message}}"
  | label_format traceId="{{.traceID}}"

# Find the correlated backend trace in Tempo
# Use the traceId from the Faro error log to search Tempo

Session Context

Faro’s session tracking groups all telemetry from a single user visit, enabling you to reconstruct the sequence of events leading to an error:

# Query all events from a problematic session in Loki
{app="checkout-frontend"} | json
  | session_id="abc-123-def-456"
  | sort by timestamp

# Results show the user's journey:
# 10:01:00 - page_view: /products
# 10:01:15 - page_view: /cart
# 10:01:30 - click: add_to_cart
# 10:01:45 - page_view: /checkout
# 10:02:01 - error: "TypeError: Cannot read property 'address' of undefined"
# 10:02:01 - Associated trace: xyz-789 (payment API returned 500)

Enhancements & Custom Configuration

Custom Events & Measurements

Track business-specific interactions beyond automatic instrumentation:

// Track custom business events
import { faro } from '@grafana/faro-web-sdk';

// E-commerce: Track add-to-cart
function addToCart(product) {
  faro.api.pushEvent('add_to_cart', {
    product_id: product.id,
    product_name: product.name,
    price: product.price,
    category: product.category,
  });
}

// SaaS: Track feature usage
function openFeature(featureName) {
  faro.api.pushEvent('feature_used', {
    feature: featureName,
    plan: currentUser.plan,
    usage_count: getUsageCount(featureName),
  });
}

// Performance: Track custom timing
function measureSearchLatency(query) {
  const start = performance.now();
  return fetch(`/api/search?q=${query}`)
    .then(response => response.json())
    .then(results => {
      const duration = performance.now() - start;
      faro.api.pushMeasurement({
        type: 'search_latency',
        values: {
          duration_ms: duration,
          result_count: results.length,
          query_length: query.length,
        },
      });
      return results;
    });
}

User Context & Metadata

// Enrich telemetry with user and environment context
import { faro } from '@grafana/faro-web-sdk';

// After user authenticates
function onLogin(user) {
  faro.api.setUser({
    id: user.id,
    username: user.email,
    attributes: {
      plan: user.subscription.plan,
      company: user.company.name,
      role: user.role,
    },
  });
}

// Track A/B test variants
function setExperimentContext(experiments) {
  faro.api.pushEvent('experiments_loaded', {
    variants: JSON.stringify(experiments),
    // e.g., { "checkout_redesign": "variant_b", "new_search": "control" }
  });
}

// Set page-specific metadata
function onRouteChange(route) {
  faro.api.setView({
    name: route.name,
    url: window.location.href,
    attributes: {
      page_type: route.meta.type,     // 'product', 'checkout', 'content'
      requires_auth: route.meta.auth,
    },
  });
}

Sampling & Privacy

For high-traffic sites, implement sampling to control costs while maintaining statistical significance:

// Sampling configuration
const faro = initializeFaro({
  url: 'https://faro-collector.your-domain.com/collect',
  app: { name: 'high-traffic-site', version: '5.0.0', environment: 'production' },

  sessionTracking: {
    enabled: true,
    samplingRate: 0.1,  // Only 10% of sessions send telemetry
  },

  // Privacy: Strip sensitive data before sending
  beforeSend: (item) => {
    // Remove credit card numbers from error messages
    if (item.payload && typeof item.payload === 'string') {
      item.payload = item.payload.replace(/\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b/g, '[CARD_REDACTED]');
    }

    // Strip email addresses
    if (item.payload && typeof item.payload === 'string') {
      item.payload = item.payload.replace(/[\w.-]+@[\w.-]+\.\w+/g, '[EMAIL_REDACTED]');
    }

    // Don't send telemetry from internal/test users
    if (faro.api.getUser()?.attributes?.internal === 'true') {
      return null;  // Drop this item
    }

    return item;
  },

  instrumentations: [
    ...getWebInstrumentations({
      captureConsole: false,  // Disable console capture for privacy
    }),
  ],
});

                            
                            GDPR & Privacy: Faro collects browser metadata (user agent, screen size, URL) by default. Ensure your privacy policy covers RUM data collection. Use the beforeSend hook to strip PII, honor Do Not Track headers, and implement consent-based activation in regions requiring explicit opt-in (EU, California).
                        

Building RUM Dashboards

Overview Dashboard

A RUM overview dashboard should answer: “How are users experiencing our application right now?”

Dashboard Design RUM Overview

RUM Overview Dashboard Panels

Panel	Visualization	Query Source
Active Sessions (live)	Stat	Mimir: `sum(faro_sessions_active)`
Web Vitals Summary	Gauge (LCP, INP, CLS)	Mimir: `histogram_quantile(0.75, ...)`
Error Rate	Time Series	Mimir: `rate(faro_errors_total[5m])`
Page Load Distribution	Heatmap	Mimir: `faro_page_load_duration_bucket`
Top Errors	Table	Loki: `{app="frontend"} \| json \| level="error"`
Slowest Pages	Bar Chart	Mimir: p75 LCP grouped by page
Geographic Performance	Geomap	Mimir: LCP by country label
Browser/OS Breakdown	Pie Chart	Mimir: sessions by user agent

Dashboard DesignWeb Vitals

Page Performance Dashboard

Drill down into individual page performance with percentile breakdowns:

# PromQL queries for page performance dashboard

# LCP p75 by page (should be < 2.5s)
histogram_quantile(0.75,
  sum by (le, page_name) (
    rate(faro_web_vitals_lcp_bucket{app="checkout-frontend"}[5m])
  )
)

# INP p75 by page (should be < 200ms)
histogram_quantile(0.75,
  sum by (le, page_name) (
    rate(faro_web_vitals_inp_bucket{app="checkout-frontend"}[5m])
  )
)

# Percentage of page loads meeting "Good" thresholds
sum(rate(faro_web_vitals_lcp_bucket{le="2.5", app="checkout-frontend"}[1h]))
/
sum(rate(faro_web_vitals_lcp_count{app="checkout-frontend"}[1h]))
* 100

# Resource loading waterfall (top 10 slowest resources)
topk(10,
  avg by (resource_url) (
    faro_resource_duration_seconds{app="checkout-frontend"}
  )
)

Error Tracking Dashboard

# LogQL queries for error tracking

# Error count by type (grouped)
sum by (error_type) (
  count_over_time(
    {app="checkout-frontend"} | json | level="error" [5m]
  )
)

# New errors (first seen in last 24h)
{app="checkout-frontend"} | json | level="error"
  | line_format "{{.error_type}}: {{.message}}"
  # Cross-reference with historical data to identify new vs recurring

# Errors with session context for debugging
{app="checkout-frontend"} | json | level="error"
  | line_format "Session: {{.session_id}} | User: {{.user_id}} | Page: {{.page}} | Error: {{.message}}"

# Error rate as percentage of page views
sum(rate(
  {app="checkout-frontend"} | json | level="error" | unwrap duration [5m]
))
/
sum(rate(
  {app="checkout-frontend"} | json | type="page_view" | unwrap duration [5m]
)) * 100

Summary & Next Steps

Real User Monitoring with Grafana Faro completes the observability picture by connecting user experience to infrastructure health:

Grafana Faro — lightweight browser agent capturing Web Vitals, errors, logs, traces, and custom events
Core Web Vitals — LCP (≤2.5s), INP (≤200ms), CLS (≤0.1) for SEO and UX benchmarking
Frontend-to-backend correlation — W3C Trace Context propagation connects browser spans to backend traces in Tempo
Session tracking — reconstruct user journeys leading to errors for faster debugging
Privacy by design — sampling, PII stripping, and consent-based activation for compliance
RUM dashboards — overview, page performance, and error tracking views in Grafana

Next in the Series

In Part 13: Application Performance with Pyroscope & k6, we’ll explore continuous profiling with Grafana Pyroscope to identify CPU and memory hotspots, and load testing with k6 to validate performance under stress — proactive performance engineering before issues reach production.

Previous Part 11: Platform Architecture Next Part 13: Pyroscope & k6