Pyroscope for Continuous Profiling
What Is Continuous Profiling?
Traditional profiling is something developers do locally — attach a profiler, reproduce an issue, collect samples, analyze. Continuous profiling changes this by running profiling in production 24/7 with negligible overhead (~2–5% CPU). This means you can answer questions like “what function consumed the most CPU during yesterday’s latency spike?” without reproducing anything.
Pyroscope supports multiple profile types:
| Profile Type | What It Measures | When to Use |
|---|---|---|
| CPU | Time spent executing code | High latency, CPU-bound workloads |
| Memory (Alloc) | Bytes allocated | High GC pressure, memory leaks |
| Memory (Inuse) | Bytes currently held | OOM risks, memory bloat |
| Goroutines (Go) | Active goroutine count by stack | Goroutine leaks, deadlocks |
| Lock Contention | Time waiting on mutexes | Concurrency bottlenecks |
| Block (Go) | Time blocked on channels/IO | IO-bound issues, channel starvation |
| Wall Clock | Real elapsed time (including waits) | Overall request latency breakdown |
Pyroscope Architecture
flowchart LR
A1["Go Service
(pprof SDK)"]
A2["Java Service
(async-profiler)"]
A3["Python Service
(py-spy)"]
AL["Grafana Alloy
(Pyroscope receiver)"]
PY["Pyroscope Server
(Ingester + Storage)"]
S3["Object Storage
(S3/GCS)"]
GR["Grafana
(Flame Graph UI)"]
A1 --> AL
A2 --> AL
A3 --> AL
AL --> PY
PY --> S3
GR --> PY
Reading Flame Graphs
A flame graph is a visualization where:
- The x-axis represents the proportion of total time (or allocations) — wider = more time
- The y-axis represents the call stack — callers below, callees above
- Each bar is a function — its width shows how much of the total resource it consumed
- Self time is time spent in that function itself (not in callees)
main() or http.Handler). The actionable insight is always in the widest self-time bars, typically visible in the “Top Table” view in Grafana.
Setting Up Pyroscope
SDK Instrumentation
// Go: Continuous profiling with Pyroscope SDK
package main
import (
"os"
"github.com/grafana/pyroscope-go"
)
func main() {
// Start Pyroscope agent
pyroscope.Start(pyroscope.Config{
ApplicationName: "order-service",
ServerAddress: "http://pyroscope.observability:4040",
Logger: pyroscope.StandardLogger,
// Profile types to collect
ProfileTypes: []pyroscope.ProfileType{
pyroscope.ProfileCPU,
pyroscope.ProfileAllocObjects,
pyroscope.ProfileAllocSpace,
pyroscope.ProfileInuseObjects,
pyroscope.ProfileInuseSpace,
pyroscope.ProfileGoroutines,
pyroscope.ProfileMutexCount,
pyroscope.ProfileMutexDuration,
pyroscope.ProfileBlockCount,
pyroscope.ProfileBlockDuration,
},
// Add labels for filtering in Grafana
Tags: map[string]string{
"env": os.Getenv("ENVIRONMENT"),
"region": os.Getenv("REGION"),
"version": os.Getenv("APP_VERSION"),
},
})
// Your application code...
startHTTPServer()
}
# Python: Continuous profiling with Pyroscope
import pyroscope
pyroscope.configure(
application_name="recommendation-service",
server_address="http://pyroscope.observability:4040",
sample_rate=100, # Hz (samples per second)
detect_subprocesses=True,
tags={
"env": "production",
"region": "us-east-1",
"version": "2.1.0",
},
)
# Tag specific code paths for focused profiling
with pyroscope.tag_wrapper({"endpoint": "/api/recommend"}):
recommendations = compute_recommendations(user_id)
# Profile specific functions
@pyroscope.tag({"operation": "ml_inference"})
def run_model_inference(features):
return model.predict(features)
// Java: Continuous profiling with Pyroscope
// Add to JVM startup flags:
// -javaagent:pyroscope.jar
// Or use programmatic initialization:
import io.pyroscope.javaagent.PyroscopeAgent;
import io.pyroscope.javaagent.config.Config;
public class Application {
public static void main(String[] args) {
PyroscopeAgent.start(
new Config.Builder()
.setApplicationName("payment-service")
.setServerAddress("http://pyroscope.observability:4040")
.setProfilingEvent(EventType.ITIMER) // CPU profiling
.setProfilingAlloc("512k") // Allocation profiling
.setProfilingLock("10ms") // Lock contention
.setLabels(Map.of(
"env", "production",
"region", System.getenv("REGION")
))
.build()
);
SpringApplication.run(Application.class, args);
}
}
Auto-Instrumentation with Alloy
For environments where modifying application code isn’t feasible, Grafana Alloy can scrape pprof endpoints or use eBPF-based profiling:
# Alloy configuration for Pyroscope scraping
# alloy-config.river
pyroscope.scrape "default" {
targets = [
// Scrape Go services exposing pprof endpoints
{"__address__" = "order-service:6060", "service_name" = "order-service"},
{"__address__" = "inventory-service:6060", "service_name" = "inventory-service"},
]
profiling_config {
profile.process_cpu { enabled = true }
profile.memory { enabled = true }
profile.goroutine { enabled = true }
profile.mutex { enabled = true }
profile.block { enabled = true }
}
forward_to = [pyroscope.write.default.receiver]
}
// eBPF-based profiling (no code changes needed)
pyroscope.ebpf "kubernetes" {
targets = discovery.kubernetes.pods.targets
forward_to = [pyroscope.write.default.receiver]
demangle = "full"
}
pyroscope.write "default" {
endpoint {
url = "http://pyroscope.observability:4040"
}
}
Querying Profiles in Grafana
# Profile queries in Grafana Explore (Pyroscope data source)
# CPU profile for a specific service
process_cpu:cpu:nanoseconds:cpu:nanoseconds{service_name="order-service"}
# Memory allocations during a time range
memory:alloc_space:bytes:space:bytes{service_name="order-service", env="production"}
# Compare two time ranges (before/after deploy)
# Left panel: select "2 hours ago" range
# Right panel: select "now" range
# Grafana shows a diff flame graph highlighting regressions
# Filter by span (when using span profiles)
process_cpu:cpu:nanoseconds:cpu:nanoseconds{
service_name="order-service",
span_name="POST /api/orders"
}
Finding a Memory Leak with Pyroscope
Symptom: Order service memory growing 500MB/hour, requiring restarts every 4 hours.
Investigation:
- Open Pyroscope in Grafana, select
memory:inuse_spaceprofile type - Set time range to 4-hour window covering the growth period
- Sort by “Self” column in the Top Table — found
json.Unmarshalconsuming 80% of inuse memory - Drilled into call stack:
cacheStore.Set()was storing full JSON responses without size limits - Root cause: an LRU cache had no eviction policy and grew unbounded
Fix: Added maxSize: 100MB to the cache configuration. Memory stabilized at 200MB.
k6 for Load Testing
Core Concepts
Grafana k6 is a developer-centric load testing tool written in Go with a JavaScript scripting API. Unlike traditional load testing tools (JMeter, Gatling), k6 is designed for:
- Developer experience — write tests in JavaScript, version in Git, run in CI/CD
- Performance — single binary generates thousands of VUs (virtual users) with minimal resources
- Extensibility — protocol support beyond HTTP (gRPC, WebSocket, SQL, Redis)
- Grafana integration — native output to Prometheus/Mimir for unified dashboards
Installing k6
# Install k6 on various platforms
# macOS (Homebrew)
brew install k6
# Windows (Chocolatey)
choco install k6
# Linux (apt)
sudo gpg -k
sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg \
--keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" \
| sudo tee /etc/apt/sources.list.d/k6.list
sudo apt-get update && sudo apt-get install k6
# Docker
docker run --rm -i grafana/k6 run - < script.js
# Verify installation
k6 version
Writing Your First Test
// basic-load-test.js — Your first k6 test
import http from 'k6/http';
import { check, sleep } from 'k6';
// Test configuration
export const options = {
// Ramp up to 50 virtual users over 30 seconds,
// hold for 2 minutes, then ramp down
stages: [
{ duration: '30s', target: 50 }, // Ramp up
{ duration: '2m', target: 50 }, // Sustained load
{ duration: '10s', target: 0 }, // Ramp down
],
};
// Default function — executed once per VU iteration
export default function () {
// Make an HTTP GET request
const response = http.get('https://api.your-service.com/products');
// Validate the response
check(response, {
'status is 200': (r) => r.status === 200,
'response time < 500ms': (r) => r.timings.duration < 500,
'body contains products': (r) => r.json().products.length > 0,
});
// Simulate user think time (1-3 seconds)
sleep(Math.random() * 2 + 1);
}
// Run with: k6 run basic-load-test.js
Advanced k6 Testing
Checks & Thresholds
Checks validate response correctness; thresholds define pass/fail criteria for the entire test run:
// advanced-thresholds.js — Production-grade k6 test
import http from 'k6/http';
import { check, sleep, group } from 'k6';
import { Rate, Trend } from 'k6/metrics';
// Custom metrics
const errorRate = new Rate('errors');
const orderLatency = new Trend('order_latency');
export const options = {
stages: [
{ duration: '1m', target: 100 },
{ duration: '5m', target: 100 },
{ duration: '30s', target: 0 },
],
// Thresholds — test FAILS if any threshold is violated
thresholds: {
// 95th percentile response time must be < 800ms
http_req_duration: ['p(95)<800', 'p(99)<1500'],
// Error rate must be below 1%
errors: ['rate<0.01'],
// Custom metric: order creation < 2s at p95
order_latency: ['p(95)<2000'],
// At least 99% of checks must pass
checks: ['rate>0.99'],
// Specific endpoint thresholds
'http_req_duration{name:get_products}': ['p(95)<300'],
'http_req_duration{name:create_order}': ['p(95)<1500'],
},
};
export default function () {
group('Browse Products', function () {
const res = http.get('https://api.example.com/products', {
tags: { name: 'get_products' },
});
check(res, {
'products returned': (r) => r.status === 200,
'has items': (r) => r.json().length > 0,
}) || errorRate.add(1);
sleep(1);
});
group('Place Order', function () {
const payload = JSON.stringify({
product_id: 'prod-123',
quantity: 1,
});
const start = Date.now();
const res = http.post('https://api.example.com/orders', payload, {
headers: { 'Content-Type': 'application/json' },
tags: { name: 'create_order' },
});
orderLatency.add(Date.now() - start);
check(res, {
'order created': (r) => r.status === 201,
'has order id': (r) => r.json().order_id !== undefined,
}) || errorRate.add(1);
sleep(2);
});
}
Scenarios for Realistic Load
Scenarios model different user behaviors simultaneously:
// multi-scenario.js — Realistic traffic patterns
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
scenarios: {
// Scenario 1: Constant browsing traffic
browsers: {
executor: 'constant-vus',
vus: 200,
duration: '10m',
exec: 'browsing',
},
// Scenario 2: Spike of buyers during flash sale
flash_sale_buyers: {
executor: 'ramping-vus',
startVUs: 0,
stages: [
{ duration: '30s', target: 500 }, // Sudden spike
{ duration: '2m', target: 500 }, // Sustained
{ duration: '30s', target: 0 }, // Drop off
],
startTime: '2m', // Starts 2 minutes into the test
exec: 'purchasing',
},
// Scenario 3: API integrations (constant rate)
api_partners: {
executor: 'constant-arrival-rate',
rate: 100, // 100 requests per second
timeUnit: '1s',
duration: '10m',
preAllocatedVUs: 50,
maxVUs: 200,
exec: 'apiCalls',
},
},
};
// Browsing behavior
export function browsing() {
http.get('https://api.example.com/products');
sleep(Math.random() * 3 + 2); // 2-5 second think time
http.get('https://api.example.com/products/' + Math.floor(Math.random() * 1000));
sleep(Math.random() * 5 + 3);
}
// Purchasing behavior (more expensive)
export function purchasing() {
const product = http.get('https://api.example.com/products/featured').json();
sleep(1);
http.post('https://api.example.com/cart', JSON.stringify({ product_id: product.id }));
sleep(2);
const orderRes = http.post('https://api.example.com/orders', JSON.stringify({ cart: 'current' }));
check(orderRes, { 'order placed': (r) => r.status === 201 });
sleep(5);
}
// API partner traffic (no think time, high throughput)
export function apiCalls() {
http.get('https://api.example.com/inventory/bulk');
}
Test Life Cycle
k6 tests have distinct phases for setup and teardown:
// lifecycle.js — Full k6 test lifecycle
import http from 'k6/http';
import { check } from 'k6';
// 1. init code — runs once per VU during initialization
const BASE_URL = __ENV.BASE_URL || 'https://api.staging.example.com';
// 2. setup() — runs once before all VUs start
export function setup() {
// Create test data, authenticate, etc.
const loginRes = http.post(`${BASE_URL}/auth/login`, JSON.stringify({
username: 'loadtest-user',
password: __ENV.TEST_PASSWORD,
}), { headers: { 'Content-Type': 'application/json' } });
const token = loginRes.json().access_token;
return { token: token }; // Passed to default() and teardown()
}
// 3. default() — the main test function, runs per VU iteration
export default function (data) {
const headers = {
'Authorization': `Bearer ${data.token}`,
'Content-Type': 'application/json',
};
const res = http.get(`${BASE_URL}/products`, { headers });
check(res, { 'authenticated': (r) => r.status === 200 });
}
// 4. teardown() — runs once after all VUs finish
export function teardown(data) {
// Clean up test data
http.del(`${BASE_URL}/test-data/cleanup`, null, {
headers: { 'Authorization': `Bearer ${data.token}` },
});
}
// Run with environment variable:
// k6 run -e BASE_URL=https://api.staging.example.com lifecycle.js
CI/CD Integration
Pipeline Integration
# GitHub Actions: k6 performance gate
name: Performance Test
on:
pull_request:
branches: [main]
jobs:
k6-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Deploy to staging
run: |
kubectl apply -f k8s/staging/
- name: Wait for rollout
run: |
kubectl rollout status deployment/api-server -n staging --timeout=120s
- name: Run k6 load test
uses: grafana/k6-action@v0.3.1
with:
filename: tests/performance/api-load-test.js
flags: >-
-e BASE_URL=https://api.staging.example.com
--out experimental-prometheus-rw=http://mimir:8080/api/v1/push
- name: Check results
if: failure()
run: |
echo "Performance test failed! Check thresholds in k6 output."
echo "Dashboard: https://grafana.example.com/d/k6-results"
exit 1
Visualizing k6 Results in Grafana
Send k6 metrics directly to Prometheus/Mimir for unified observability dashboards:
# Output k6 metrics to Prometheus remote write (Mimir)
k6 run \
--out experimental-prometheus-rw=http://mimir-distributor:8080/api/v1/push \
--tag testid=$(date +%s) \
load-test.js
# Key k6 metrics available in Grafana after the run:
# k6_http_req_duration — Response time histogram
# k6_http_reqs — Request count
# k6_vus — Active virtual users
# k6_iterations — Completed iterations
# k6_checks — Check pass/fail rate
# k6_data_sent/received — Network throughput
k6 Performance Dashboard Panels
| Panel | Query | Purpose |
|---|---|---|
| Active VUs | k6_vus{} | Load profile shape |
| Request Rate | rate(k6_http_reqs_total[1m]) | Throughput |
| Response Time (p95) | histogram_quantile(0.95, rate(k6_http_req_duration_bucket[30s])) | Latency SLO |
| Error Rate | rate(k6_http_reqs_total{status!~"2.."}[1m]) / rate(k6_http_reqs_total[1m]) | Reliability |
| Check Pass Rate | k6_checks{} / k6_checks_total{} | Correctness |
| Data Transfer | rate(k6_data_received[1m]) | Bandwidth consumption |
Profiling Under Load
The most powerful workflow combines k6 and Pyroscope: run a load test while continuous profiling is active, then correlate the flame graph with the load test timeline:
sequenceDiagram
participant E as Engineer
participant K as k6 Load Test
participant S as Target Service
participant P as Pyroscope
participant G as Grafana
E->>K: Start load test (500 VUs)
K->>S: Sustained HTTP requests
S->>P: Continuous CPU + Memory profiles
Note over K,P: 5 minute load test window
K-->>E: Test complete (p95 = 1200ms, threshold: 800ms)
E->>G: Open Pyroscope, filter to test window
G-->>E: Flame graph shows 40% CPU in JSON serialization
E->>E: Optimize serialization → p95 drops to 400ms
# Workflow: Profile under load
# 1. Start your k6 test
k6 run --out experimental-prometheus-rw=http://mimir:8080/api/v1/push load-test.js &
# 2. Note the start/end timestamps of the test run
# 3. In Grafana → Explore → Pyroscope data source:
# - Select service: "order-service"
# - Profile type: process_cpu
# - Time range: match k6 test duration
# - Compare with "before load" baseline
# 4. The diff flame graph highlights functions that consume
# MORE CPU under load vs baseline — these are your bottlenecks
Summary & Next Steps
- Pyroscope enables continuous profiling with <5% overhead, providing function-level CPU/memory visibility in production 24/7
- Flame graphs visualize where time and resources are spent — wide bars at the top are optimization targets
- k6 provides developer-friendly load testing with JavaScript scripts, checks, thresholds, and multi-scenario support
- Thresholds make tests pass/fail automatically — critical for CI/CD performance gates
- Scenarios model realistic traffic patterns by combining different user behaviors simultaneously
- Grafana integration unifies k6 metrics, application profiles, and infrastructure metrics in a single dashboard
- Profile under load is the most powerful workflow — combine k6 + Pyroscope to pinpoint exactly which functions degrade at scale
Next in the Series
In Part 14: Supporting DevOps Processes with Observability, we’ll explore how observability integrates across the entire DevOps lifecycle — from code and test to deploy, operate, and monitor — building feedback loops that accelerate delivery while maintaining reliability.