Back to Monitoring & Observability Series

Grafana Deep Dive Part 3: Setting Up a Learning Environment with Demo Applications

June 15, 2026 Wasil Zafar 30 min read

Build a complete observability lab from scratch — set up Grafana Cloud, install a local Kubernetes cluster, deploy the OpenTelemetry Demo application with dozens of microservices, configure the OTel Collector, and explore real telemetry data flowing through Loki, Mimir, and Tempo.

Table of Contents

  1. Grafana Cloud
  2. Installing Prerequisites
  3. OpenTelemetry Demo Application
  4. OpenTelemetry Collector
  5. Deploying on Kubernetes
  6. Exploring Telemetry
  7. Adding Your Own Applications
  8. Troubleshooting
  9. Summary & Next Steps

Introducing Grafana Cloud

Grafana Cloud is Grafana Labs' fully managed observability platform that provides hosted instances of Grafana, Mimir (metrics), Loki (logs), Tempo (traces), and additional services like Alerting, Incident, OnCall, and Synthetic Monitoring. For learning purposes, the free tier is more than sufficient — it includes 10,000 series for metrics, 50 GB of logs, and 50 GB of traces per month.

Using Grafana Cloud for learning has several advantages over running everything locally:

  • Zero infrastructure management — no need to configure storage, retention, or scaling for the backend databases
  • Pre-configured data sources — Loki, Mimir, and Tempo are already wired into your Grafana instance
  • Always-on availability — your telemetry data persists between learning sessions
  • Production-like experience — the same APIs and query languages used in enterprise environments
Learning Environment Architecture
flowchart LR
    subgraph Local["Local Machine / WSL2"]
        K8s[Kubernetes Cluster
kind / k3d / minikube] Demo[OTel Demo App
~15 microservices] Coll[OTel Collector] end subgraph Cloud["Grafana Cloud"] Mimir[Mimir
Metrics] Loki[Loki
Logs] Tempo[Tempo
Traces] Grafana[Grafana
Dashboards] end Demo --> Coll K8s --> Coll Coll -->|OTLP metrics| Mimir Coll -->|OTLP logs| Loki Coll -->|OTLP traces| Tempo Grafana --> Mimir Grafana --> Loki Grafana --> Tempo

Creating an Account

Navigate to grafana.com and click Create free account. You can sign up with an email address or use SSO via Google, GitHub, or Microsoft. After email verification, you'll be prompted to name your organization (this becomes your stack URL slug, e.g., yourorg.grafana.net).

Tip: Choose a short, memorable organization name — it becomes part of all your endpoint URLs. You can't change it later without creating a new stack.

Once your account is created, Grafana Cloud automatically provisions:

  • A Grafana instance (your dashboards and exploration UI)
  • A Prometheus-compatible metrics endpoint (powered by Mimir)
  • A Loki logs endpoint
  • A Tempo traces endpoint
  • A Grafana Alloy configuration (the recommended collector)

Exploring the Portal

The Grafana Cloud portal (grafana.com/orgs/yourorg) is your management console. From here you can view your stack details, manage API keys, check usage against free-tier limits, and access documentation. Key sections include:

  • Stack Management — View your hosted Grafana URL, Prometheus remote-write endpoint, Loki push endpoint, and Tempo OTLP endpoint
  • Access Policies — Create scoped API tokens for programmatic access (metrics write, logs write, traces write)
  • Usage & Billing — Monitor your consumption against free-tier quotas in real time
  • Integrations — One-click setup for common data sources (Linux, Docker, Kubernetes, databases)

The Grafana Instance

Click Launch Grafana to open your hosted Grafana instance. This is a full-featured Grafana installation with pre-configured data sources. Take a moment to explore:

  • Explore (compass icon) — Free-form querying of Loki, Mimir, and Tempo
  • Dashboards — Pre-built dashboards for common integrations
  • Alerting — Rule-based alerting with notification channels
  • Connections → Data sources — Your pre-configured Prometheus, Loki, and Tempo connections
Important: Note down your Prometheus remote-write URL, Loki URL, Tempo URL, and your Grafana Cloud instance ID (a numeric identifier). You'll need these when configuring the OpenTelemetry Collector later.

Installing Prerequisite Tools

Before deploying the demo application, you need a Linux-compatible environment with container orchestration tools. This section covers setup on Windows (via WSL2) and macOS. If you're already on Linux, skip directly to the container tools section.

WSL2 Setup (Windows Only)

Windows Subsystem for Linux 2 provides a real Linux kernel running inside a lightweight VM. It's required for running Docker and Kubernetes tooling natively on Windows.

# Install WSL2 with Ubuntu (run in PowerShell as Administrator)
wsl --install -d Ubuntu

# After installation completes and you've set up your Linux user, verify:
wsl --list --verbose
# NAME      STATE           VERSION
# Ubuntu    Running         2

# Update the distribution
sudo apt update && sudo apt upgrade -y

# Install essential build tools
sudo apt install -y build-essential curl git wget unzip jq
Windows Users: All subsequent commands in this guide should be run inside WSL2 (your Ubuntu terminal), not in PowerShell or CMD. The only exception is Docker Desktop, which integrates with WSL2 automatically.

Homebrew

Homebrew works on both macOS and Linux (including WSL2) and provides a consistent way to install development tools. It simplifies installing kubectl, helm, kind, and other CLI tools.

# Install Homebrew (works on macOS and Linux/WSL2)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Add Homebrew to your PATH (Linux/WSL2 — follow the post-install instructions)
echo 'eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)"' >> ~/.bashrc
eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)"

# Verify installation
brew --version
# Homebrew 4.x.x

Container Orchestration Tools

You need Docker (or a compatible container runtime) and kubectl for managing your local Kubernetes cluster.

# Option A: Docker Desktop (macOS / Windows with WSL2 integration)
# Download from https://www.docker.com/products/docker-desktop/
# Enable "Use the WSL 2 based engine" in Settings > General
# Enable Kubernetes in Settings > Kubernetes (optional — we'll use kind instead)

# Option B: Docker Engine on Linux/WSL2 (without Desktop)
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
newgrp docker

# Verify Docker is running
docker run --rm hello-world

# Install kubectl
brew install kubectl

# Verify kubectl
kubectl version --client
# Client Version: v1.30.x

Single-Node Kubernetes Cluster

For a learning environment, you need a lightweight single-node Kubernetes cluster. There are three popular options — kind (Kubernetes IN Docker) is recommended for its speed and low resource usage.

Hands-On Cluster Options Comparison
ToolSpeedRAM UsageBest For
kindFast (30s)~500 MBCI/CD, quick iteration, multiple clusters
k3dFast (20s)~512 MBLightweight k3s, built-in registry, load balancer
minikubeModerate (60s)~2 GBFeature-rich, add-ons ecosystem, ingress
kind k3d minikube
# Install kind (recommended)
brew install kind

# Create a cluster with extra port mappings for the demo frontend
cat <<EOF | kind create cluster --name observability-lab --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  extraPortMappings:
  - containerPort: 30080
    hostPort: 8080
    protocol: TCP
  - containerPort: 30088
    hostPort: 8088
    protocol: TCP
EOF

# Verify the cluster is running
kubectl cluster-info --context kind-observability-lab
# Kubernetes control plane is running at https://127.0.0.1:xxxxx

# Check nodes are Ready
kubectl get nodes
# NAME                              STATUS   ROLES           AGE   VERSION
# observability-lab-control-plane   Ready    control-plane   30s   v1.30.x

If you prefer k3d or minikube:

# Alternative: k3d
brew install k3d
k3d cluster create observability-lab \
  -p "8080:30080@server:0" \
  -p "8088:30088@server:0" \
  --agents 0

# Alternative: minikube
brew install minikube
minikube start --cpus=4 --memory=4096 --driver=docker --profile=observability-lab
# Note: minikube uses 'minikube tunnel' for LoadBalancer access

Helm

Helm is the package manager for Kubernetes. The OpenTelemetry Demo and the OTel Collector are both distributed as Helm charts, making installation straightforward.

# Install Helm
brew install helm

# Verify installation
helm version
# version.BuildInfo{Version:"v3.15.x", ...}

# Add the OpenTelemetry Helm repository
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update

# Verify the repo was added
helm search repo open-telemetry
# NAME                                         CHART VERSION  APP VERSION
# open-telemetry/opentelemetry-demo            0.32.x         1.11.x
# open-telemetry/opentelemetry-collector       0.97.x         0.104.x

Installing the OpenTelemetry Demo Application

The OpenTelemetry Demo is a distributed e-commerce application maintained by the OpenTelemetry community. It includes approximately 15 microservices written in different languages (Go, Python, Java, .NET, Node.js, Rust, C++, Ruby, PHP, Kotlin, Erlang) that communicate via gRPC and HTTP. Every service is instrumented with OpenTelemetry, generating real logs, metrics, and traces.

OpenTelemetry Demo Microservices
flowchart TD
    FE[Frontend
TypeScript] --> CS[Cart Service
C#/.NET] FE --> PS[Product Catalog
Go] FE --> RS[Recommendation
Python] FE --> CKS[Checkout Service
Go] CKS --> PAY[Payment Service
Node.js] CKS --> SHIP[Shipping Service
Rust] CKS --> EMAIL[Email Service
Ruby] CKS --> CS CKS --> CUR[Currency Service
C++] RS --> PS FE --> ADS[Ad Service
Java] FE --> CUR LG[Load Generator
Python/Locust] --> FE

Setting Up Access Credentials

Before deploying the demo, you need API credentials to send telemetry to Grafana Cloud. Navigate to your Grafana Cloud portal and create an Access Policy token.

# In Grafana Cloud Portal:
# 1. Go to grafana.com → your organization → Access Policies
# 2. Click "Create access policy"
# 3. Name it: "otel-demo-write"
# 4. Add scopes:
#    - metrics:write
#    - logs:write
#    - traces:write
# 5. Click "Create token" and copy the generated token

# Store credentials as environment variables (add to ~/.bashrc for persistence)
export GRAFANA_CLOUD_INSTANCE_ID="123456"           # Your numeric instance ID
export GRAFANA_CLOUD_API_KEY="glc_eyJ..."           # The token you just created
export GRAFANA_CLOUD_PROM_URL="https://prometheus-prod-xx-xxx.grafana.net/api/prom/push"
export GRAFANA_CLOUD_LOKI_URL="https://logs-prod-xxx.grafana.net/loki/api/v1/push"
export GRAFANA_CLOUD_TEMPO_URL="https://tempo-prod-xx-xxx.grafana.net/tempo"
export GRAFANA_CLOUD_OTLP_URL="https://otlp-gateway-prod-xx-xxx.grafana.net/otlp"
Security: Never commit API keys to version control. Use environment variables or Kubernetes secrets. The GRAFANA_CLOUD_API_KEY grants write access to your metrics, logs, and traces — treat it like a password.

Downloading the Repository

While Helm is the primary deployment method, cloning the repository gives you access to the full source code, Dockerfiles, and configuration examples for reference.

# Clone the OpenTelemetry Demo repository (for reference)
git clone https://github.com/open-telemetry/opentelemetry-demo.git
cd opentelemetry-demo

# Check the current version
git describe --tags
# v1.11.x

# Explore the structure
ls src/
# adservice/  cartservice/  checkoutservice/  currencyservice/  emailservice/
# featureflagservice/  frontend/  frauddetectionservice/  loadgenerator/
# paymentservice/  productcatalogservice/  recommendationservice/  shippingservice/

Adding Credentials and Endpoints

Create a Helm values file that configures the demo to send telemetry to Grafana Cloud via the OTLP gateway. This is the simplest approach — Grafana Cloud's OTLP endpoint accepts metrics, logs, and traces over a single connection.

# otel-demo-values.yaml — Helm values for OpenTelemetry Demo with Grafana Cloud
# Save this file in your working directory

default:
  env:
    - name: OTEL_SERVICE_NAME
      valueFrom:
        fieldRef:
          fieldPath: metadata.labels['app.kubernetes.io/component']

opentelemetry-collector:
  mode: deployment
  config:
    exporters:
      otlphttp/grafana:
        endpoint: "${GRAFANA_CLOUD_OTLP_URL}"
        headers:
          Authorization: "Basic ${GRAFANA_CLOUD_INSTANCE_ID}:${GRAFANA_CLOUD_API_KEY}"

    service:
      pipelines:
        traces:
          exporters: [otlphttp/grafana]
        metrics:
          exporters: [otlphttp/grafana]
        logs:
          exporters: [otlphttp/grafana]

Installing the OpenTelemetry Collector

The OpenTelemetry Collector is the central hub for receiving, processing, and exporting telemetry data. It sits between your applications and Grafana Cloud, handling batching, retry, authentication, and format conversion. The demo includes a bundled collector, but understanding its configuration is essential for production use.

Collector Pipeline Architecture
flowchart LR
    subgraph Receivers
        OTLP[OTLP
gRPC :4317
HTTP :4318] PROM[Prometheus
Scrape] end subgraph Processors BATCH[Batch
200ms / 8192] RES[Resource
Attributes] MEM[Memory
Limiter] end subgraph Exporters GRAFANA[OTLP/HTTP
Grafana Cloud] DEBUG[Debug
stdout] end OTLP --> MEM PROM --> MEM MEM --> RES RES --> BATCH BATCH --> GRAFANA BATCH --> DEBUG

Configuration

The collector configuration defines receivers (how data enters), processors (transformations), exporters (where data goes), and service pipelines (which connect them). Here's a complete configuration for sending all three signal types to Grafana Cloud:

# otel-collector-config.yaml — Full collector configuration for Grafana Cloud
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

  # Scrape Kubernetes node and pod metrics
  prometheus:
    config:
      scrape_configs:
        - job_name: 'otel-collector'
          scrape_interval: 30s
          static_configs:
            - targets: ['localhost:8888']

processors:
  # Prevent OOM kills
  memory_limiter:
    check_interval: 5s
    limit_mib: 512
    spike_limit_mib: 128

  # Add resource attributes to all signals
  resource:
    attributes:
      - key: deployment.environment
        value: "learning-lab"
        action: upsert
      - key: service.namespace
        value: "otel-demo"
        action: upsert

  # Batch telemetry for efficient export
  batch:
    send_batch_size: 8192
    send_batch_max_size: 16384
    timeout: 200ms

exporters:
  # Grafana Cloud OTLP endpoint (all signals over one connection)
  otlphttp/grafana:
    endpoint: "${env:GRAFANA_CLOUD_OTLP_URL}"
    headers:
      Authorization: "Basic ${env:GRAFANA_CLOUD_INSTANCE_ID}:${env:GRAFANA_CLOUD_API_KEY}"
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 300s

  # Debug exporter for troubleshooting (writes to stdout)
  debug:
    verbosity: basic
    sampling_initial: 5
    sampling_thereafter: 200

extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  zpages:
    endpoint: 0.0.0.0:55679

service:
  extensions: [health_check, zpages]
  telemetry:
    logs:
      level: info
    metrics:
      address: 0.0.0.0:8888
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, resource, batch]
      exporters: [otlphttp/grafana, debug]
    metrics:
      receivers: [otlp, prometheus]
      processors: [memory_limiter, resource, batch]
      exporters: [otlphttp/grafana]
    logs:
      receivers: [otlp]
      processors: [memory_limiter, resource, batch]
      exporters: [otlphttp/grafana]
Key Design Decisions: The memory_limiter processor prevents the collector from consuming unbounded memory. The batch processor groups telemetry for efficient network transfer. The resource processor adds environment labels that help filter data in Grafana.

Deployment

Create a Kubernetes Secret for your Grafana Cloud credentials, then deploy the collector using the Helm chart:

# Create a namespace for the demo
kubectl create namespace otel-demo

# Create a secret with Grafana Cloud credentials
kubectl create secret generic grafana-cloud-credentials \
  --namespace otel-demo \
  --from-literal=instance-id="${GRAFANA_CLOUD_INSTANCE_ID}" \
  --from-literal=api-key="${GRAFANA_CLOUD_API_KEY}" \
  --from-literal=otlp-endpoint="${GRAFANA_CLOUD_OTLP_URL}"

# Verify the secret was created
kubectl get secrets -n otel-demo
# NAME                          TYPE     DATA   AGE
# grafana-cloud-credentials     Opaque   3      5s

Installing the OpenTelemetry Demo on Kubernetes

With prerequisites in place and credentials configured, deploy the full demo application using Helm. The chart installs all microservices, a load generator, and an embedded OpenTelemetry Collector.

Helm Installation

# Create the final values file with your actual credentials
cat <<EOF > otel-demo-grafana-values.yaml
default:
  envOverrides:
    - name: OTEL_EXPORTER_OTLP_ENDPOINT
      value: "http://otel-demo-otelcol:4317"

components:
  frontendProxy:
    service:
      type: NodePort
      ports:
        - name: http
          port: 8080
          targetPort: 8080
          nodePort: 30080

opentelemetry-collector:
  mode: deployment
  resources:
    limits:
      memory: 1Gi
      cpu: 500m
  config:
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    processors:
      memory_limiter:
        check_interval: 5s
        limit_mib: 512
        spike_limit_mib: 128
      resource:
        attributes:
          - key: deployment.environment
            value: "learning-lab"
            action: upsert
      batch:
        send_batch_size: 8192
        timeout: 200ms
    exporters:
      otlphttp/grafana:
        endpoint: "${GRAFANA_CLOUD_OTLP_URL}"
        headers:
          Authorization: "Basic ${GRAFANA_CLOUD_INSTANCE_ID}:${GRAFANA_CLOUD_API_KEY}"
      debug:
        verbosity: basic
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, resource, batch]
          exporters: [otlphttp/grafana, debug]
        metrics:
          receivers: [otlp]
          processors: [memory_limiter, resource, batch]
          exporters: [otlphttp/grafana]
        logs:
          receivers: [otlp]
          processors: [memory_limiter, resource, batch]
          exporters: [otlphttp/grafana]
EOF

# Install the demo with Grafana Cloud configuration
helm install otel-demo open-telemetry/opentelemetry-demo \
  --namespace otel-demo \
  --values otel-demo-grafana-values.yaml \
  --wait \
  --timeout 10m

# Watch the installation progress
kubectl get pods -n otel-demo -w

Verifying the Deployment

After a few minutes, all pods should be running. The demo includes approximately 15-20 pods depending on optional components.

# Check all pods are running (may take 3-5 minutes for image pulls)
kubectl get pods -n otel-demo
# NAME                                          READY   STATUS    RESTARTS   AGE
# otel-demo-adservice-xxxxx                     1/1     Running   0          2m
# otel-demo-cartservice-xxxxx                   1/1     Running   0          2m
# otel-demo-checkoutservice-xxxxx               1/1     Running   0          2m
# otel-demo-currencyservice-xxxxx               1/1     Running   0          2m
# otel-demo-emailservice-xxxxx                  1/1     Running   0          2m
# otel-demo-frontend-xxxxx                      1/1     Running   0          2m
# otel-demo-frontendproxy-xxxxx                 1/1     Running   0          2m
# otel-demo-loadgenerator-xxxxx                 1/1     Running   0          2m
# otel-demo-otelcol-xxxxx                       1/1     Running   0          2m
# otel-demo-paymentservice-xxxxx                1/1     Running   0          2m
# otel-demo-productcatalogservice-xxxxx         1/1     Running   0          2m
# otel-demo-recommendationservice-xxxxx         1/1     Running   0          2m
# otel-demo-shippingservice-xxxxx               1/1     Running   0          2m

# Check for any issues
kubectl get pods -n otel-demo --field-selector=status.phase!=Running
# No resources found — all healthy!

# Check the collector logs for successful exports
kubectl logs -n otel-demo deployment/otel-demo-otelcol --tail=20 | grep -i "export"
# Exporting spans   {"kind": "exporter", "data_type": "traces", "name": "otlphttp/grafana"}

Accessing the Frontend

The demo includes an e-commerce web store frontend. Access it to generate realistic user traffic alongside the automated load generator.

# For kind clusters with NodePort configured:
# Open http://localhost:8080 in your browser

# For minikube:
minikube service otel-demo-frontendproxy -n otel-demo --profile=observability-lab

# For k3d:
# The port mapping was configured at cluster creation, use http://localhost:8080

# Alternatively, use port-forward (works with any cluster):
kubectl port-forward -n otel-demo svc/otel-demo-frontendproxy 8080:8080 &
echo "Demo frontend available at http://localhost:8080"
Try it: Browse products, add items to your cart, and complete a checkout. Each user action generates traces spanning multiple microservices, structured logs, and latency metrics — all visible in Grafana within seconds.

Exploring Telemetry from the Demo Application

With the demo running and sending data to Grafana Cloud, you can now explore all three pillars of observability. Open your Grafana instance and navigate to the Explore view.

Logs in Loki

Select the Loki data source in Explore. The demo services emit structured JSON logs with OpenTelemetry context (trace IDs, span IDs). Try these LogQL queries:

# View all logs from the checkout service
{service_name="checkoutservice"}

# Filter for errors across all services
{deployment_environment="learning-lab"} |= "error" | json

# Find logs correlated with slow traces
{service_name="cartservice"} | json | duration > 500ms

# Parse structured fields and filter
{service_name="paymentservice"} | json | line_format "{{.severity}} - {{.body}}"

# Count errors by service over time
sum by (service_name) (count_over_time({deployment_environment="learning-lab"} |= "ERROR" [5m]))
Exercise Log Exploration

Task: Find all checkout failures in the last hour and identify which downstream service caused the failure.

  1. Open Explore with Loki data source
  2. Query: {service_name="checkoutservice"} |= "failed" | json
  3. Expand a log line and find the trace_id field
  4. Click the trace ID to jump to the correlated trace in Tempo
LogQL correlation structured logs

Metrics in Mimir

Switch to the Prometheus data source (backed by Mimir). The demo generates hundreds of metrics including HTTP request durations, gRPC call counts, runtime metrics, and custom business metrics.

# HTTP request duration histogram (95th percentile)
histogram_quantile(0.95, 
  sum by (le, service_name) (
    rate(http_server_request_duration_seconds_bucket{deployment_environment="learning-lab"}[5m])
  )
)

# Request rate by service
sum by (service_name) (
  rate(http_server_request_duration_seconds_count{deployment_environment="learning-lab"}[5m])
)

# Error rate (5xx responses)
sum by (service_name) (
  rate(http_server_request_duration_seconds_count{
    deployment_environment="learning-lab",
    http_response_status_code=~"5.."
  }[5m])
)

# gRPC call duration by method
histogram_quantile(0.99,
  sum by (le, rpc_method) (
    rate(rpc_server_duration_milliseconds_bucket{deployment_environment="learning-lab"}[5m])
  )
)

Traces in Tempo

Switch to the Tempo data source. The demo generates distributed traces that span multiple services as requests flow through the e-commerce system. Use TraceQL to search for traces:

# Find slow checkout traces (> 2 seconds)
{span.service.name = "checkoutservice" && duration > 2s}

# Find traces with errors
{status = error && resource.deployment.environment = "learning-lab"}

# Find traces for a specific HTTP endpoint
{span.http.route = "/api/cart" && span.http.method = "POST"}

# Find traces spanning multiple services
{span.service.name = "frontend"} >> {span.service.name = "cartservice"}

# Aggregate: p95 latency by service
{resource.deployment.environment = "learning-lab"} | avg(duration) by (resource.service.name)
Correlation Power: Grafana automatically links between signals. From a trace, click "Logs for this span" to see correlated log entries. From a log line, click the trace ID to see the full distributed trace. From a metric spike, click "Exemplars" to jump to representative traces.

Adding Your Own Applications

The demo environment isn't just for the pre-built services — you can deploy your own applications alongside it and have them report telemetry through the same collector to Grafana Cloud.

Instrumenting a Custom App

Here's a minimal Node.js Express application instrumented with OpenTelemetry. It demonstrates automatic HTTP instrumentation plus custom spans and metrics:

// app.js — Minimal instrumented Express app
// Run: npm init -y && npm install express @opentelemetry/sdk-node \
//   @opentelemetry/auto-instrumentations-node @opentelemetry/exporter-trace-otlp-grpc \
//   @opentelemetry/exporter-metrics-otlp-grpc @opentelemetry/exporter-logs-otlp-grpc

const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');
const { OTLPMetricExporter } = require('@opentelemetry/exporter-metrics-otlp-grpc');
const { PeriodicExportingMetricReader } = require('@opentelemetry/sdk-metrics');

// Initialize OpenTelemetry SDK — MUST be done before importing other modules
const sdk = new NodeSDK({
  serviceName: 'my-custom-service',
  traceExporter: new OTLPTraceExporter({
    url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://otel-demo-otelcol:4317'
  }),
  metricReader: new PeriodicExportingMetricReader({
    exporter: new OTLPMetricExporter({
      url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://otel-demo-otelcol:4317'
    }),
    exportIntervalMillis: 30000
  }),
  instrumentations: [getNodeAutoInstrumentations()]
});
sdk.start();

// Now import Express (after SDK initialization)
const express = require('express');
const app = express();

app.get('/hello', (req, res) => {
  res.json({ message: 'Hello from my custom service!', timestamp: new Date().toISOString() });
});

app.get('/health', (req, res) => {
  res.status(200).json({ status: 'healthy' });
});

const PORT = process.env.PORT || 3000;
app.listen(PORT, () => {
  console.log(`Custom service listening on port ${PORT}`);
});

Deploying Alongside the Demo

Create a Kubernetes deployment for your custom application in the same namespace, pointing its OTLP endpoint at the demo's collector:

# my-custom-app.yaml — Deploy alongside the OTel Demo
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-custom-service
  namespace: otel-demo
  labels:
    app: my-custom-service
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-custom-service
  template:
    metadata:
      labels:
        app: my-custom-service
    spec:
      containers:
        - name: app
          image: my-custom-service:latest
          ports:
            - containerPort: 3000
          env:
            - name: OTEL_EXPORTER_OTLP_ENDPOINT
              value: "http://otel-demo-otelcol:4317"
            - name: OTEL_SERVICE_NAME
              value: "my-custom-service"
            - name: OTEL_RESOURCE_ATTRIBUTES
              value: "deployment.environment=learning-lab,service.namespace=custom"
          resources:
            limits:
              memory: 256Mi
              cpu: 200m
            requests:
              memory: 128Mi
              cpu: 100m
---
apiVersion: v1
kind: Service
metadata:
  name: my-custom-service
  namespace: otel-demo
spec:
  selector:
    app: my-custom-service
  ports:
    - port: 3000
      targetPort: 3000
# Build and load the image into kind
docker build -t my-custom-service:latest .
kind load docker-image my-custom-service:latest --name observability-lab

# Deploy
kubectl apply -f my-custom-app.yaml

# Verify it's running
kubectl get pods -n otel-demo -l app=my-custom-service
# NAME                                 READY   STATUS    RESTARTS   AGE
# my-custom-service-xxxxx              1/1     Running   0          30s

# Generate some traffic
kubectl run curl-test --rm -it --image=curlimages/curl --restart=Never -n otel-demo -- \
  sh -c "for i in \$(seq 1 50); do curl -s http://my-custom-service:3000/hello; sleep 0.5; done"

Within seconds, your custom service's telemetry appears in Grafana alongside the demo data:

# In Grafana Explore (Loki):
{service_name="my-custom-service"}

# In Grafana Explore (Prometheus/Mimir):
rate(http_server_request_duration_seconds_count{service_name="my-custom-service"}[5m])

# In Grafana Explore (Tempo):
{resource.service.name = "my-custom-service"}

Troubleshooting

When telemetry isn't appearing in Grafana Cloud, systematically work through these debugging steps. Most issues come down to credentials, network connectivity, or collector configuration errors.

Checking Credentials

# Verify your credentials are set correctly
echo "Instance ID: ${GRAFANA_CLOUD_INSTANCE_ID}"
echo "OTLP URL: ${GRAFANA_CLOUD_OTLP_URL}"
echo "API Key (first 20 chars): ${GRAFANA_CLOUD_API_KEY:0:20}..."

# Test the OTLP endpoint directly with curl
curl -v -X POST "${GRAFANA_CLOUD_OTLP_URL}/v1/traces" \
  -H "Authorization: Basic $(echo -n "${GRAFANA_CLOUD_INSTANCE_ID}:${GRAFANA_CLOUD_API_KEY}" | base64)" \
  -H "Content-Type: application/json" \
  -d '{}'
# Expected: HTTP 200 or 400 (bad request but auth succeeded)
# If 401: credentials are wrong
# If connection refused: URL is wrong

# Verify the Kubernetes secret matches your env vars
kubectl get secret grafana-cloud-credentials -n otel-demo -o jsonpath='{.data.instance-id}' | base64 -d
kubectl get secret grafana-cloud-credentials -n otel-demo -o jsonpath='{.data.api-key}' | base64 -d | head -c 20

Reading Collector Logs

# View recent collector logs
kubectl logs -n otel-demo deployment/otel-demo-otelcol --tail=50

# Filter for errors
kubectl logs -n otel-demo deployment/otel-demo-otelcol --tail=100 | grep -i "error\|failed\|denied"

# Watch logs in real time
kubectl logs -n otel-demo deployment/otel-demo-otelcol -f | grep -v "debug"

# Common error messages and their causes:
# "401 Unauthorized" → Wrong API key or instance ID
# "connection refused" → Wrong endpoint URL or network issue
# "context deadline exceeded" → Timeout reaching Grafana Cloud (DNS or firewall)
# "dropping data" → Memory limiter triggered (increase limits or reduce volume)
# "queue full" → Exporter can't keep up (check network, increase batch size)

Debugging the Collector

The collector exposes health check and debug endpoints that help diagnose pipeline issues:

# Check collector health
kubectl port-forward -n otel-demo deployment/otel-demo-otelcol 13133:13133 &
curl -s http://localhost:13133/health | jq .
# {"status":"Server available","upSince":"2026-06-15T10:00:00Z","uptime":"2h30m"}

# Access zPages for pipeline debugging
kubectl port-forward -n otel-demo deployment/otel-demo-otelcol 55679:55679 &
# Open http://localhost:55679/debug/tracez — shows recent traces through the collector
# Open http://localhost:55679/debug/pipelinez — shows pipeline topology and stats

# Check collector metrics (self-monitoring)
kubectl port-forward -n otel-demo deployment/otel-demo-otelcol 8888:8888 &
curl -s http://localhost:8888/metrics | grep otelcol_exporter

# Key metrics to check:
# otelcol_exporter_sent_spans — successfully exported traces
# otelcol_exporter_send_failed_spans — failed trace exports
# otelcol_exporter_sent_metric_points — successfully exported metrics
# otelcol_receiver_accepted_spans — traces received by the collector
# otelcol_processor_dropped_spans — traces dropped (memory limiter)

# If no data is being received, check the application pods:
kubectl logs -n otel-demo deployment/otel-demo-frontend --tail=20 | grep -i "otel\|export\|telemetry"

# Restart the collector after config changes
kubectl rollout restart deployment/otel-demo-otelcol -n otel-demo
kubectl rollout status deployment/otel-demo-otelcol -n otel-demo
Troubleshooting Checklist When Telemetry Doesn't Appear
  1. Are pods running?kubectl get pods -n otel-demo
  2. Is the collector healthy? — Check /health endpoint (port 13133)
  3. Are credentials correct? — Test with curl against OTLP endpoint
  4. Is data reaching the collector? — Check otelcol_receiver_accepted_* metrics
  5. Is data being exported? — Check otelcol_exporter_sent_* metrics
  6. Are there export errors? — Check otelcol_exporter_send_failed_* metrics
  7. Is the debug exporter showing data? — Check collector stdout logs
  8. DNS resolution working?kubectl exec -it [collector-pod] -- nslookup otlp-gateway-prod-xx-xxx.grafana.net
debugging connectivity health checks

Summary & Next Steps

You now have a fully functional observability learning environment with:

  • Grafana Cloud — hosted dashboards, alerting, and query interfaces for all three signal types
  • Local Kubernetes cluster — a lightweight kind/k3d/minikube cluster running on your machine
  • OpenTelemetry Demo — ~15 microservices generating realistic e-commerce telemetry in multiple languages
  • OpenTelemetry Collector — receiving, processing, and exporting all signals to Grafana Cloud
  • Your own applications — the ability to deploy custom services and see their telemetry alongside the demo

This environment will serve as the foundation for all remaining parts of the Grafana Deep Dive track. In subsequent articles, you'll write LogQL queries against the demo's logs, build PromQL dashboards from its metrics, trace requests across services with TraceQL, and configure alerts based on real application behavior.

Cleanup & Resume: To pause your lab, run kind delete cluster --name observability-lab. To resume, re-run the kind create cluster and helm install commands. Your Grafana Cloud data persists between sessions — only the local cluster state is ephemeral.

Next in the Grafana Track

In Part 4: Looking at Logs with Grafana Loki, we'll dive deep into LogQL — from basic label matchers and line filters to complex aggregations, pattern detection, and building log-based dashboards and alerts using data from our demo environment.