Grafana Deep Dive Part 13: Application Performance with Pyroscope & k6

Pyroscope for Continuous Profiling

What Is Continuous Profiling?

Traditional profiling is something developers do locally — attach a profiler, reproduce an issue, collect samples, analyze. Continuous profiling changes this by running profiling in production 24/7 with negligible overhead (~2–5% CPU). This means you can answer questions like “what function consumed the most CPU during yesterday’s latency spike?” without reproducing anything.

                            
                            Key Insight: Continuous profiling is to CPU/memory what distributed tracing is to request flow. Traces tell you where time is spent across services; profiles tell you where time is spent within a service at the function and line level. Together, they pinpoint the exact code responsible for performance issues.
                        

Pyroscope supports multiple profile types:

Profile Type	What It Measures	When to Use
CPU	Time spent executing code	High latency, CPU-bound workloads
Memory (Alloc)	Bytes allocated	High GC pressure, memory leaks
Memory (Inuse)	Bytes currently held	OOM risks, memory bloat
Goroutines (Go)	Active goroutine count by stack	Goroutine leaks, deadlocks
Lock Contention	Time waiting on mutexes	Concurrency bottlenecks
Block (Go)	Time blocked on channels/IO	IO-bound issues, channel starvation
Wall Clock	Real elapsed time (including waits)	Overall request latency breakdown

Pyroscope Architecture

Pyroscope Data Flow

flowchart LR
    A1["Go Service
(pprof SDK)"]
    A2["Java Service
(async-profiler)"]
    A3["Python Service
(py-spy)"]
    AL["Grafana Alloy
(Pyroscope receiver)"]
    PY["Pyroscope Server
(Ingester + Storage)"]
    S3["Object Storage
(S3/GCS)"]
    GR["Grafana
(Flame Graph UI)"]
    A1 --> AL
    A2 --> AL
    A3 --> AL
    AL --> PY
    PY --> S3
    GR --> PY

Reading Flame Graphs

A flame graph is a visualization where:

The x-axis represents the proportion of total time (or allocations) — wider = more time
The y-axis represents the call stack — callers below, callees above
Each bar is a function — its width shows how much of the total resource it consumed
Self time is time spent in that function itself (not in callees)

                            
                            Reading Tips: Look for wide bars at the top of the flame graph — those are leaf functions consuming the most resources. A wide bar at the bottom just means it’s a common entry point (like main() or http.Handler). The actionable insight is always in the widest self-time bars, typically visible in the “Top Table” view in Grafana.
                        

Setting Up Pyroscope

SDK Instrumentation

// Go: Continuous profiling with Pyroscope SDK
package main

import (
    "os"
    "github.com/grafana/pyroscope-go"
)

func main() {
    // Start Pyroscope agent
    pyroscope.Start(pyroscope.Config{
        ApplicationName: "order-service",
        ServerAddress:   "http://pyroscope.observability:4040",
        Logger:          pyroscope.StandardLogger,

        // Profile types to collect
        ProfileTypes: []pyroscope.ProfileType{
            pyroscope.ProfileCPU,
            pyroscope.ProfileAllocObjects,
            pyroscope.ProfileAllocSpace,
            pyroscope.ProfileInuseObjects,
            pyroscope.ProfileInuseSpace,
            pyroscope.ProfileGoroutines,
            pyroscope.ProfileMutexCount,
            pyroscope.ProfileMutexDuration,
            pyroscope.ProfileBlockCount,
            pyroscope.ProfileBlockDuration,
        },

        // Add labels for filtering in Grafana
        Tags: map[string]string{
            "env":     os.Getenv("ENVIRONMENT"),
            "region":  os.Getenv("REGION"),
            "version": os.Getenv("APP_VERSION"),
        },
    })

    // Your application code...
    startHTTPServer()
}

# Python: Continuous profiling with Pyroscope
import pyroscope

pyroscope.configure(
    application_name="recommendation-service",
    server_address="http://pyroscope.observability:4040",
    sample_rate=100,  # Hz (samples per second)
    detect_subprocesses=True,
    tags={
        "env": "production",
        "region": "us-east-1",
        "version": "2.1.0",
    },
)

# Tag specific code paths for focused profiling
with pyroscope.tag_wrapper({"endpoint": "/api/recommend"}):
    recommendations = compute_recommendations(user_id)

# Profile specific functions
@pyroscope.tag({"operation": "ml_inference"})
def run_model_inference(features):
    return model.predict(features)

// Java: Continuous profiling with Pyroscope
// Add to JVM startup flags:
// -javaagent:pyroscope.jar
// Or use programmatic initialization:

import io.pyroscope.javaagent.PyroscopeAgent;
import io.pyroscope.javaagent.config.Config;

public class Application {
    public static void main(String[] args) {
        PyroscopeAgent.start(
            new Config.Builder()
                .setApplicationName("payment-service")
                .setServerAddress("http://pyroscope.observability:4040")
                .setProfilingEvent(EventType.ITIMER)  // CPU profiling
                .setProfilingAlloc("512k")             // Allocation profiling
                .setProfilingLock("10ms")              // Lock contention
                .setLabels(Map.of(
                    "env", "production",
                    "region", System.getenv("REGION")
                ))
                .build()
        );

        SpringApplication.run(Application.class, args);
    }
}

Auto-Instrumentation with Alloy

For environments where modifying application code isn’t feasible, Grafana Alloy can scrape pprof endpoints or use eBPF-based profiling:

# Alloy configuration for Pyroscope scraping
# alloy-config.river

pyroscope.scrape "default" {
  targets = [
    // Scrape Go services exposing pprof endpoints
    {"__address__" = "order-service:6060", "service_name" = "order-service"},
    {"__address__" = "inventory-service:6060", "service_name" = "inventory-service"},
  ]
  profiling_config {
    profile.process_cpu { enabled = true }
    profile.memory { enabled = true }
    profile.goroutine { enabled = true }
    profile.mutex { enabled = true }
    profile.block { enabled = true }
  }
  forward_to = [pyroscope.write.default.receiver]
}

// eBPF-based profiling (no code changes needed)
pyroscope.ebpf "kubernetes" {
  targets = discovery.kubernetes.pods.targets
  forward_to = [pyroscope.write.default.receiver]
  demangle = "full"
}

pyroscope.write "default" {
  endpoint {
    url = "http://pyroscope.observability:4040"
  }
}

Querying Profiles in Grafana

# Profile queries in Grafana Explore (Pyroscope data source)

# CPU profile for a specific service
process_cpu:cpu:nanoseconds:cpu:nanoseconds{service_name="order-service"}

# Memory allocations during a time range
memory:alloc_space:bytes:space:bytes{service_name="order-service", env="production"}

# Compare two time ranges (before/after deploy)
# Left panel: select "2 hours ago" range
# Right panel: select "now" range
# Grafana shows a diff flame graph highlighting regressions

# Filter by span (when using span profiles)
process_cpu:cpu:nanoseconds:cpu:nanoseconds{
  service_name="order-service",
  span_name="POST /api/orders"
}

Case Study Production Debugging

Finding a Memory Leak with Pyroscope

Symptom: Order service memory growing 500MB/hour, requiring restarts every 4 hours.

Investigation:

Open Pyroscope in Grafana, select memory:inuse_space profile type
Set time range to 4-hour window covering the growth period
Sort by “Self” column in the Top Table — found json.Unmarshal consuming 80% of inuse memory
Drilled into call stack: cacheStore.Set() was storing full JSON responses without size limits
Root cause: an LRU cache had no eviction policy and grew unbounded

Fix: Added maxSize: 100MB to the cache configuration. Memory stabilized at 200MB.

Memory ProfilingProduction Debugging

k6 for Load Testing

Core Concepts

Grafana k6 is a developer-centric load testing tool written in Go with a JavaScript scripting API. Unlike traditional load testing tools (JMeter, Gatling), k6 is designed for:

Developer experience — write tests in JavaScript, version in Git, run in CI/CD
Performance — single binary generates thousands of VUs (virtual users) with minimal resources
Extensibility — protocol support beyond HTTP (gRPC, WebSocket, SQL, Redis)
Grafana integration — native output to Prometheus/Mimir for unified dashboards

Installing k6

# Install k6 on various platforms

# macOS (Homebrew)
brew install k6

# Windows (Chocolatey)
choco install k6

# Linux (apt)
sudo gpg -k
sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg \
  --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" \
  | sudo tee /etc/apt/sources.list.d/k6.list
sudo apt-get update && sudo apt-get install k6

# Docker
docker run --rm -i grafana/k6 run - < script.js

# Verify installation
k6 version

Writing Your First Test

// basic-load-test.js — Your first k6 test
import http from 'k6/http';
import { check, sleep } from 'k6';

// Test configuration
export const options = {
  // Ramp up to 50 virtual users over 30 seconds,
  // hold for 2 minutes, then ramp down
  stages: [
    { duration: '30s', target: 50 },   // Ramp up
    { duration: '2m', target: 50 },    // Sustained load
    { duration: '10s', target: 0 },    // Ramp down
  ],
};

// Default function — executed once per VU iteration
export default function () {
  // Make an HTTP GET request
  const response = http.get('https://api.your-service.com/products');

  // Validate the response
  check(response, {
    'status is 200': (r) => r.status === 200,
    'response time < 500ms': (r) => r.timings.duration < 500,
    'body contains products': (r) => r.json().products.length > 0,
  });

  // Simulate user think time (1-3 seconds)
  sleep(Math.random() * 2 + 1);
}

// Run with: k6 run basic-load-test.js

Advanced k6 Testing

Checks & Thresholds

Checks validate response correctness; thresholds define pass/fail criteria for the entire test run:

// advanced-thresholds.js — Production-grade k6 test
import http from 'k6/http';
import { check, sleep, group } from 'k6';
import { Rate, Trend } from 'k6/metrics';

// Custom metrics
const errorRate = new Rate('errors');
const orderLatency = new Trend('order_latency');

export const options = {
  stages: [
    { duration: '1m', target: 100 },
    { duration: '5m', target: 100 },
    { duration: '30s', target: 0 },
  ],

  // Thresholds — test FAILS if any threshold is violated
  thresholds: {
    // 95th percentile response time must be < 800ms
    http_req_duration: ['p(95)<800', 'p(99)<1500'],

    // Error rate must be below 1%
    errors: ['rate<0.01'],

    // Custom metric: order creation < 2s at p95
    order_latency: ['p(95)<2000'],

    // At least 99% of checks must pass
    checks: ['rate>0.99'],

    // Specific endpoint thresholds
    'http_req_duration{name:get_products}': ['p(95)<300'],
    'http_req_duration{name:create_order}': ['p(95)<1500'],
  },
};

export default function () {
  group('Browse Products', function () {
    const res = http.get('https://api.example.com/products', {
      tags: { name: 'get_products' },
    });
    check(res, {
      'products returned': (r) => r.status === 200,
      'has items': (r) => r.json().length > 0,
    }) || errorRate.add(1);
    sleep(1);
  });

  group('Place Order', function () {
    const payload = JSON.stringify({
      product_id: 'prod-123',
      quantity: 1,
    });

    const start = Date.now();
    const res = http.post('https://api.example.com/orders', payload, {
      headers: { 'Content-Type': 'application/json' },
      tags: { name: 'create_order' },
    });
    orderLatency.add(Date.now() - start);

    check(res, {
      'order created': (r) => r.status === 201,
      'has order id': (r) => r.json().order_id !== undefined,
    }) || errorRate.add(1);
    sleep(2);
  });
}

Scenarios for Realistic Load

Scenarios model different user behaviors simultaneously:

// multi-scenario.js — Realistic traffic patterns
import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  scenarios: {
    // Scenario 1: Constant browsing traffic
    browsers: {
      executor: 'constant-vus',
      vus: 200,
      duration: '10m',
      exec: 'browsing',
    },

    // Scenario 2: Spike of buyers during flash sale
    flash_sale_buyers: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '30s', target: 500 },  // Sudden spike
        { duration: '2m', target: 500 },   // Sustained
        { duration: '30s', target: 0 },    // Drop off
      ],
      startTime: '2m',  // Starts 2 minutes into the test
      exec: 'purchasing',
    },

    // Scenario 3: API integrations (constant rate)
    api_partners: {
      executor: 'constant-arrival-rate',
      rate: 100,          // 100 requests per second
      timeUnit: '1s',
      duration: '10m',
      preAllocatedVUs: 50,
      maxVUs: 200,
      exec: 'apiCalls',
    },
  },
};

// Browsing behavior
export function browsing() {
  http.get('https://api.example.com/products');
  sleep(Math.random() * 3 + 2);  // 2-5 second think time
  http.get('https://api.example.com/products/' + Math.floor(Math.random() * 1000));
  sleep(Math.random() * 5 + 3);
}

// Purchasing behavior (more expensive)
export function purchasing() {
  const product = http.get('https://api.example.com/products/featured').json();
  sleep(1);
  http.post('https://api.example.com/cart', JSON.stringify({ product_id: product.id }));
  sleep(2);
  const orderRes = http.post('https://api.example.com/orders', JSON.stringify({ cart: 'current' }));
  check(orderRes, { 'order placed': (r) => r.status === 201 });
  sleep(5);
}

// API partner traffic (no think time, high throughput)
export function apiCalls() {
  http.get('https://api.example.com/inventory/bulk');
}

Test Life Cycle

k6 tests have distinct phases for setup and teardown:

// lifecycle.js — Full k6 test lifecycle
import http from 'k6/http';
import { check } from 'k6';

// 1. init code — runs once per VU during initialization
const BASE_URL = __ENV.BASE_URL || 'https://api.staging.example.com';

// 2. setup() — runs once before all VUs start
export function setup() {
  // Create test data, authenticate, etc.
  const loginRes = http.post(`${BASE_URL}/auth/login`, JSON.stringify({
    username: 'loadtest-user',
    password: __ENV.TEST_PASSWORD,
  }), { headers: { 'Content-Type': 'application/json' } });

  const token = loginRes.json().access_token;
  return { token: token };  // Passed to default() and teardown()
}

// 3. default() — the main test function, runs per VU iteration
export default function (data) {
  const headers = {
    'Authorization': `Bearer ${data.token}`,
    'Content-Type': 'application/json',
  };
  const res = http.get(`${BASE_URL}/products`, { headers });
  check(res, { 'authenticated': (r) => r.status === 200 });
}

// 4. teardown() — runs once after all VUs finish
export function teardown(data) {
  // Clean up test data
  http.del(`${BASE_URL}/test-data/cleanup`, null, {
    headers: { 'Authorization': `Bearer ${data.token}` },
  });
}

// Run with environment variable:
// k6 run -e BASE_URL=https://api.staging.example.com lifecycle.js

CI/CD Integration

Pipeline Integration

# GitHub Actions: k6 performance gate
name: Performance Test
on:
  pull_request:
    branches: [main]

jobs:
  k6-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Deploy to staging
        run: |
          kubectl apply -f k8s/staging/

      - name: Wait for rollout
        run: |
          kubectl rollout status deployment/api-server -n staging --timeout=120s

      - name: Run k6 load test
        uses: grafana/k6-action@v0.3.1
        with:
          filename: tests/performance/api-load-test.js
          flags: >-
            -e BASE_URL=https://api.staging.example.com
            --out experimental-prometheus-rw=http://mimir:8080/api/v1/push

      - name: Check results
        if: failure()
        run: |
          echo "Performance test failed! Check thresholds in k6 output."
          echo "Dashboard: https://grafana.example.com/d/k6-results"
          exit 1

Visualizing k6 Results in Grafana

Send k6 metrics directly to Prometheus/Mimir for unified observability dashboards:

# Output k6 metrics to Prometheus remote write (Mimir)
k6 run \
  --out experimental-prometheus-rw=http://mimir-distributor:8080/api/v1/push \
  --tag testid=$(date +%s) \
  load-test.js

# Key k6 metrics available in Grafana after the run:
# k6_http_req_duration        — Response time histogram
# k6_http_reqs                — Request count
# k6_vus                      — Active virtual users
# k6_iterations               — Completed iterations
# k6_checks                   — Check pass/fail rate
# k6_data_sent/received       — Network throughput

Dashboard Design k6 + Grafana

k6 Performance Dashboard Panels

Panel	Query	Purpose
Active VUs	`k6_vus{}`	Load profile shape
Request Rate	`rate(k6_http_reqs_total[1m])`	Throughput
Response Time (p95)	`histogram_quantile(0.95, rate(k6_http_req_duration_bucket[30s]))`	Latency SLO
Error Rate	`rate(k6_http_reqs_total{status!~"2.."}[1m]) / rate(k6_http_reqs_total[1m])`	Reliability
Check Pass Rate	`k6_checks{} / k6_checks_total{}`	Correctness
Data Transfer	`rate(k6_data_received[1m])`	Bandwidth consumption

k6 MetricsDashboard Design

Profiling Under Load

The most powerful workflow combines k6 and Pyroscope: run a load test while continuous profiling is active, then correlate the flame graph with the load test timeline:

Combined k6 + Pyroscope Workflow

sequenceDiagram
    participant E as Engineer
    participant K as k6 Load Test
    participant S as Target Service
    participant P as Pyroscope
    participant G as Grafana
    E->>K: Start load test (500 VUs)
    K->>S: Sustained HTTP requests
    S->>P: Continuous CPU + Memory profiles
    Note over K,P: 5 minute load test window
    K-->>E: Test complete (p95 = 1200ms, threshold: 800ms)
    E->>G: Open Pyroscope, filter to test window
    G-->>E: Flame graph shows 40% CPU in JSON serialization
    E->>E: Optimize serialization → p95 drops to 400ms

# Workflow: Profile under load
# 1. Start your k6 test
k6 run --out experimental-prometheus-rw=http://mimir:8080/api/v1/push load-test.js &

# 2. Note the start/end timestamps of the test run
# 3. In Grafana → Explore → Pyroscope data source:
#    - Select service: "order-service"
#    - Profile type: process_cpu
#    - Time range: match k6 test duration
#    - Compare with "before load" baseline

# 4. The diff flame graph highlights functions that consume
#    MORE CPU under load vs baseline — these are your bottlenecks

                            
                            Pro Tip: Always compare profiles under load versus at idle. A function that uses 5% CPU at idle but 60% under load is your scaling bottleneck. The diff flame graph in Grafana makes this comparison trivial — red bars show functions that grew, green bars show functions that shrank.
                        

Summary & Next Steps

Pyroscope enables continuous profiling with <5% overhead, providing function-level CPU/memory visibility in production 24/7
Flame graphs visualize where time and resources are spent — wide bars at the top are optimization targets
k6 provides developer-friendly load testing with JavaScript scripts, checks, thresholds, and multi-scenario support
Thresholds make tests pass/fail automatically — critical for CI/CD performance gates
Scenarios model realistic traffic patterns by combining different user behaviors simultaneously
Grafana integration unifies k6 metrics, application profiles, and infrastructure metrics in a single dashboard
Profile under load is the most powerful workflow — combine k6 + Pyroscope to pinpoint exactly which functions degrade at scale

Next in the Series

In Part 14: Supporting DevOps Processes with Observability, we’ll explore how observability integrates across the entire DevOps lifecycle — from code and test to deploy, operate, and monitor — building feedback loops that accelerate delivery while maintaining reliability.

Previous Part 12: Real User Monitoring Next Part 14: DevOps Observability