Introducing Grafana Cloud
Grafana Cloud is Grafana Labs' fully managed observability platform that provides hosted instances of Grafana, Mimir (metrics), Loki (logs), Tempo (traces), and additional services like Alerting, Incident, OnCall, and Synthetic Monitoring. For learning purposes, the free tier is more than sufficient — it includes 10,000 series for metrics, 50 GB of logs, and 50 GB of traces per month.
Using Grafana Cloud for learning has several advantages over running everything locally:
- Zero infrastructure management — no need to configure storage, retention, or scaling for the backend databases
- Pre-configured data sources — Loki, Mimir, and Tempo are already wired into your Grafana instance
- Always-on availability — your telemetry data persists between learning sessions
- Production-like experience — the same APIs and query languages used in enterprise environments
flowchart LR
subgraph Local["Local Machine / WSL2"]
K8s[Kubernetes Cluster
kind / k3d / minikube]
Demo[OTel Demo App
~15 microservices]
Coll[OTel Collector]
end
subgraph Cloud["Grafana Cloud"]
Mimir[Mimir
Metrics]
Loki[Loki
Logs]
Tempo[Tempo
Traces]
Grafana[Grafana
Dashboards]
end
Demo --> Coll
K8s --> Coll
Coll -->|OTLP metrics| Mimir
Coll -->|OTLP logs| Loki
Coll -->|OTLP traces| Tempo
Grafana --> Mimir
Grafana --> Loki
Grafana --> Tempo
Creating an Account
Navigate to grafana.com and click Create free account. You can sign up with an email address or use SSO via Google, GitHub, or Microsoft. After email verification, you'll be prompted to name your organization (this becomes your stack URL slug, e.g., yourorg.grafana.net).
Once your account is created, Grafana Cloud automatically provisions:
- A Grafana instance (your dashboards and exploration UI)
- A Prometheus-compatible metrics endpoint (powered by Mimir)
- A Loki logs endpoint
- A Tempo traces endpoint
- A Grafana Alloy configuration (the recommended collector)
Exploring the Portal
The Grafana Cloud portal (grafana.com/orgs/yourorg) is your management console. From here you can view your stack details, manage API keys, check usage against free-tier limits, and access documentation. Key sections include:
- Stack Management — View your hosted Grafana URL, Prometheus remote-write endpoint, Loki push endpoint, and Tempo OTLP endpoint
- Access Policies — Create scoped API tokens for programmatic access (metrics write, logs write, traces write)
- Usage & Billing — Monitor your consumption against free-tier quotas in real time
- Integrations — One-click setup for common data sources (Linux, Docker, Kubernetes, databases)
The Grafana Instance
Click Launch Grafana to open your hosted Grafana instance. This is a full-featured Grafana installation with pre-configured data sources. Take a moment to explore:
- Explore (compass icon) — Free-form querying of Loki, Mimir, and Tempo
- Dashboards — Pre-built dashboards for common integrations
- Alerting — Rule-based alerting with notification channels
- Connections → Data sources — Your pre-configured Prometheus, Loki, and Tempo connections
Installing Prerequisite Tools
Before deploying the demo application, you need a Linux-compatible environment with container orchestration tools. This section covers setup on Windows (via WSL2) and macOS. If you're already on Linux, skip directly to the container tools section.
WSL2 Setup (Windows Only)
Windows Subsystem for Linux 2 provides a real Linux kernel running inside a lightweight VM. It's required for running Docker and Kubernetes tooling natively on Windows.
# Install WSL2 with Ubuntu (run in PowerShell as Administrator)
wsl --install -d Ubuntu
# After installation completes and you've set up your Linux user, verify:
wsl --list --verbose
# NAME STATE VERSION
# Ubuntu Running 2
# Update the distribution
sudo apt update && sudo apt upgrade -y
# Install essential build tools
sudo apt install -y build-essential curl git wget unzip jq
Homebrew
Homebrew works on both macOS and Linux (including WSL2) and provides a consistent way to install development tools. It simplifies installing kubectl, helm, kind, and other CLI tools.
# Install Homebrew (works on macOS and Linux/WSL2)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Add Homebrew to your PATH (Linux/WSL2 — follow the post-install instructions)
echo 'eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)"' >> ~/.bashrc
eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)"
# Verify installation
brew --version
# Homebrew 4.x.x
Container Orchestration Tools
You need Docker (or a compatible container runtime) and kubectl for managing your local Kubernetes cluster.
# Option A: Docker Desktop (macOS / Windows with WSL2 integration)
# Download from https://www.docker.com/products/docker-desktop/
# Enable "Use the WSL 2 based engine" in Settings > General
# Enable Kubernetes in Settings > Kubernetes (optional — we'll use kind instead)
# Option B: Docker Engine on Linux/WSL2 (without Desktop)
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
newgrp docker
# Verify Docker is running
docker run --rm hello-world
# Install kubectl
brew install kubectl
# Verify kubectl
kubectl version --client
# Client Version: v1.30.x
Single-Node Kubernetes Cluster
For a learning environment, you need a lightweight single-node Kubernetes cluster. There are three popular options — kind (Kubernetes IN Docker) is recommended for its speed and low resource usage.
| Tool | Speed | RAM Usage | Best For |
|---|---|---|---|
| kind | Fast (30s) | ~500 MB | CI/CD, quick iteration, multiple clusters |
| k3d | Fast (20s) | ~512 MB | Lightweight k3s, built-in registry, load balancer |
| minikube | Moderate (60s) | ~2 GB | Feature-rich, add-ons ecosystem, ingress |
# Install kind (recommended)
brew install kind
# Create a cluster with extra port mappings for the demo frontend
cat <<EOF | kind create cluster --name observability-lab --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
extraPortMappings:
- containerPort: 30080
hostPort: 8080
protocol: TCP
- containerPort: 30088
hostPort: 8088
protocol: TCP
EOF
# Verify the cluster is running
kubectl cluster-info --context kind-observability-lab
# Kubernetes control plane is running at https://127.0.0.1:xxxxx
# Check nodes are Ready
kubectl get nodes
# NAME STATUS ROLES AGE VERSION
# observability-lab-control-plane Ready control-plane 30s v1.30.x
If you prefer k3d or minikube:
# Alternative: k3d
brew install k3d
k3d cluster create observability-lab \
-p "8080:30080@server:0" \
-p "8088:30088@server:0" \
--agents 0
# Alternative: minikube
brew install minikube
minikube start --cpus=4 --memory=4096 --driver=docker --profile=observability-lab
# Note: minikube uses 'minikube tunnel' for LoadBalancer access
Helm
Helm is the package manager for Kubernetes. The OpenTelemetry Demo and the OTel Collector are both distributed as Helm charts, making installation straightforward.
# Install Helm
brew install helm
# Verify installation
helm version
# version.BuildInfo{Version:"v3.15.x", ...}
# Add the OpenTelemetry Helm repository
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update
# Verify the repo was added
helm search repo open-telemetry
# NAME CHART VERSION APP VERSION
# open-telemetry/opentelemetry-demo 0.32.x 1.11.x
# open-telemetry/opentelemetry-collector 0.97.x 0.104.x
Installing the OpenTelemetry Demo Application
The OpenTelemetry Demo is a distributed e-commerce application maintained by the OpenTelemetry community. It includes approximately 15 microservices written in different languages (Go, Python, Java, .NET, Node.js, Rust, C++, Ruby, PHP, Kotlin, Erlang) that communicate via gRPC and HTTP. Every service is instrumented with OpenTelemetry, generating real logs, metrics, and traces.
flowchart TD
FE[Frontend
TypeScript] --> CS[Cart Service
C#/.NET]
FE --> PS[Product Catalog
Go]
FE --> RS[Recommendation
Python]
FE --> CKS[Checkout Service
Go]
CKS --> PAY[Payment Service
Node.js]
CKS --> SHIP[Shipping Service
Rust]
CKS --> EMAIL[Email Service
Ruby]
CKS --> CS
CKS --> CUR[Currency Service
C++]
RS --> PS
FE --> ADS[Ad Service
Java]
FE --> CUR
LG[Load Generator
Python/Locust] --> FE
Setting Up Access Credentials
Before deploying the demo, you need API credentials to send telemetry to Grafana Cloud. Navigate to your Grafana Cloud portal and create an Access Policy token.
# In Grafana Cloud Portal:
# 1. Go to grafana.com → your organization → Access Policies
# 2. Click "Create access policy"
# 3. Name it: "otel-demo-write"
# 4. Add scopes:
# - metrics:write
# - logs:write
# - traces:write
# 5. Click "Create token" and copy the generated token
# Store credentials as environment variables (add to ~/.bashrc for persistence)
export GRAFANA_CLOUD_INSTANCE_ID="123456" # Your numeric instance ID
export GRAFANA_CLOUD_API_KEY="glc_eyJ..." # The token you just created
export GRAFANA_CLOUD_PROM_URL="https://prometheus-prod-xx-xxx.grafana.net/api/prom/push"
export GRAFANA_CLOUD_LOKI_URL="https://logs-prod-xxx.grafana.net/loki/api/v1/push"
export GRAFANA_CLOUD_TEMPO_URL="https://tempo-prod-xx-xxx.grafana.net/tempo"
export GRAFANA_CLOUD_OTLP_URL="https://otlp-gateway-prod-xx-xxx.grafana.net/otlp"
GRAFANA_CLOUD_API_KEY grants write access to your metrics, logs, and traces — treat it like a password.
Downloading the Repository
While Helm is the primary deployment method, cloning the repository gives you access to the full source code, Dockerfiles, and configuration examples for reference.
# Clone the OpenTelemetry Demo repository (for reference)
git clone https://github.com/open-telemetry/opentelemetry-demo.git
cd opentelemetry-demo
# Check the current version
git describe --tags
# v1.11.x
# Explore the structure
ls src/
# adservice/ cartservice/ checkoutservice/ currencyservice/ emailservice/
# featureflagservice/ frontend/ frauddetectionservice/ loadgenerator/
# paymentservice/ productcatalogservice/ recommendationservice/ shippingservice/
Adding Credentials and Endpoints
Create a Helm values file that configures the demo to send telemetry to Grafana Cloud via the OTLP gateway. This is the simplest approach — Grafana Cloud's OTLP endpoint accepts metrics, logs, and traces over a single connection.
# otel-demo-values.yaml — Helm values for OpenTelemetry Demo with Grafana Cloud
# Save this file in your working directory
default:
env:
- name: OTEL_SERVICE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.labels['app.kubernetes.io/component']
opentelemetry-collector:
mode: deployment
config:
exporters:
otlphttp/grafana:
endpoint: "${GRAFANA_CLOUD_OTLP_URL}"
headers:
Authorization: "Basic ${GRAFANA_CLOUD_INSTANCE_ID}:${GRAFANA_CLOUD_API_KEY}"
service:
pipelines:
traces:
exporters: [otlphttp/grafana]
metrics:
exporters: [otlphttp/grafana]
logs:
exporters: [otlphttp/grafana]
Installing the OpenTelemetry Collector
The OpenTelemetry Collector is the central hub for receiving, processing, and exporting telemetry data. It sits between your applications and Grafana Cloud, handling batching, retry, authentication, and format conversion. The demo includes a bundled collector, but understanding its configuration is essential for production use.
flowchart LR
subgraph Receivers
OTLP[OTLP
gRPC :4317
HTTP :4318]
PROM[Prometheus
Scrape]
end
subgraph Processors
BATCH[Batch
200ms / 8192]
RES[Resource
Attributes]
MEM[Memory
Limiter]
end
subgraph Exporters
GRAFANA[OTLP/HTTP
Grafana Cloud]
DEBUG[Debug
stdout]
end
OTLP --> MEM
PROM --> MEM
MEM --> RES
RES --> BATCH
BATCH --> GRAFANA
BATCH --> DEBUG
Configuration
The collector configuration defines receivers (how data enters), processors (transformations), exporters (where data goes), and service pipelines (which connect them). Here's a complete configuration for sending all three signal types to Grafana Cloud:
# otel-collector-config.yaml — Full collector configuration for Grafana Cloud
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
# Scrape Kubernetes node and pod metrics
prometheus:
config:
scrape_configs:
- job_name: 'otel-collector'
scrape_interval: 30s
static_configs:
- targets: ['localhost:8888']
processors:
# Prevent OOM kills
memory_limiter:
check_interval: 5s
limit_mib: 512
spike_limit_mib: 128
# Add resource attributes to all signals
resource:
attributes:
- key: deployment.environment
value: "learning-lab"
action: upsert
- key: service.namespace
value: "otel-demo"
action: upsert
# Batch telemetry for efficient export
batch:
send_batch_size: 8192
send_batch_max_size: 16384
timeout: 200ms
exporters:
# Grafana Cloud OTLP endpoint (all signals over one connection)
otlphttp/grafana:
endpoint: "${env:GRAFANA_CLOUD_OTLP_URL}"
headers:
Authorization: "Basic ${env:GRAFANA_CLOUD_INSTANCE_ID}:${env:GRAFANA_CLOUD_API_KEY}"
retry_on_failure:
enabled: true
initial_interval: 5s
max_interval: 30s
max_elapsed_time: 300s
# Debug exporter for troubleshooting (writes to stdout)
debug:
verbosity: basic
sampling_initial: 5
sampling_thereafter: 200
extensions:
health_check:
endpoint: 0.0.0.0:13133
zpages:
endpoint: 0.0.0.0:55679
service:
extensions: [health_check, zpages]
telemetry:
logs:
level: info
metrics:
address: 0.0.0.0:8888
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, resource, batch]
exporters: [otlphttp/grafana, debug]
metrics:
receivers: [otlp, prometheus]
processors: [memory_limiter, resource, batch]
exporters: [otlphttp/grafana]
logs:
receivers: [otlp]
processors: [memory_limiter, resource, batch]
exporters: [otlphttp/grafana]
memory_limiter processor prevents the collector from consuming unbounded memory. The batch processor groups telemetry for efficient network transfer. The resource processor adds environment labels that help filter data in Grafana.
Deployment
Create a Kubernetes Secret for your Grafana Cloud credentials, then deploy the collector using the Helm chart:
# Create a namespace for the demo
kubectl create namespace otel-demo
# Create a secret with Grafana Cloud credentials
kubectl create secret generic grafana-cloud-credentials \
--namespace otel-demo \
--from-literal=instance-id="${GRAFANA_CLOUD_INSTANCE_ID}" \
--from-literal=api-key="${GRAFANA_CLOUD_API_KEY}" \
--from-literal=otlp-endpoint="${GRAFANA_CLOUD_OTLP_URL}"
# Verify the secret was created
kubectl get secrets -n otel-demo
# NAME TYPE DATA AGE
# grafana-cloud-credentials Opaque 3 5s
Installing the OpenTelemetry Demo on Kubernetes
With prerequisites in place and credentials configured, deploy the full demo application using Helm. The chart installs all microservices, a load generator, and an embedded OpenTelemetry Collector.
Helm Installation
# Create the final values file with your actual credentials
cat <<EOF > otel-demo-grafana-values.yaml
default:
envOverrides:
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://otel-demo-otelcol:4317"
components:
frontendProxy:
service:
type: NodePort
ports:
- name: http
port: 8080
targetPort: 8080
nodePort: 30080
opentelemetry-collector:
mode: deployment
resources:
limits:
memory: 1Gi
cpu: 500m
config:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
memory_limiter:
check_interval: 5s
limit_mib: 512
spike_limit_mib: 128
resource:
attributes:
- key: deployment.environment
value: "learning-lab"
action: upsert
batch:
send_batch_size: 8192
timeout: 200ms
exporters:
otlphttp/grafana:
endpoint: "${GRAFANA_CLOUD_OTLP_URL}"
headers:
Authorization: "Basic ${GRAFANA_CLOUD_INSTANCE_ID}:${GRAFANA_CLOUD_API_KEY}"
debug:
verbosity: basic
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, resource, batch]
exporters: [otlphttp/grafana, debug]
metrics:
receivers: [otlp]
processors: [memory_limiter, resource, batch]
exporters: [otlphttp/grafana]
logs:
receivers: [otlp]
processors: [memory_limiter, resource, batch]
exporters: [otlphttp/grafana]
EOF
# Install the demo with Grafana Cloud configuration
helm install otel-demo open-telemetry/opentelemetry-demo \
--namespace otel-demo \
--values otel-demo-grafana-values.yaml \
--wait \
--timeout 10m
# Watch the installation progress
kubectl get pods -n otel-demo -w
Verifying the Deployment
After a few minutes, all pods should be running. The demo includes approximately 15-20 pods depending on optional components.
# Check all pods are running (may take 3-5 minutes for image pulls)
kubectl get pods -n otel-demo
# NAME READY STATUS RESTARTS AGE
# otel-demo-adservice-xxxxx 1/1 Running 0 2m
# otel-demo-cartservice-xxxxx 1/1 Running 0 2m
# otel-demo-checkoutservice-xxxxx 1/1 Running 0 2m
# otel-demo-currencyservice-xxxxx 1/1 Running 0 2m
# otel-demo-emailservice-xxxxx 1/1 Running 0 2m
# otel-demo-frontend-xxxxx 1/1 Running 0 2m
# otel-demo-frontendproxy-xxxxx 1/1 Running 0 2m
# otel-demo-loadgenerator-xxxxx 1/1 Running 0 2m
# otel-demo-otelcol-xxxxx 1/1 Running 0 2m
# otel-demo-paymentservice-xxxxx 1/1 Running 0 2m
# otel-demo-productcatalogservice-xxxxx 1/1 Running 0 2m
# otel-demo-recommendationservice-xxxxx 1/1 Running 0 2m
# otel-demo-shippingservice-xxxxx 1/1 Running 0 2m
# Check for any issues
kubectl get pods -n otel-demo --field-selector=status.phase!=Running
# No resources found — all healthy!
# Check the collector logs for successful exports
kubectl logs -n otel-demo deployment/otel-demo-otelcol --tail=20 | grep -i "export"
# Exporting spans {"kind": "exporter", "data_type": "traces", "name": "otlphttp/grafana"}
Accessing the Frontend
The demo includes an e-commerce web store frontend. Access it to generate realistic user traffic alongside the automated load generator.
# For kind clusters with NodePort configured:
# Open http://localhost:8080 in your browser
# For minikube:
minikube service otel-demo-frontendproxy -n otel-demo --profile=observability-lab
# For k3d:
# The port mapping was configured at cluster creation, use http://localhost:8080
# Alternatively, use port-forward (works with any cluster):
kubectl port-forward -n otel-demo svc/otel-demo-frontendproxy 8080:8080 &
echo "Demo frontend available at http://localhost:8080"
Exploring Telemetry from the Demo Application
With the demo running and sending data to Grafana Cloud, you can now explore all three pillars of observability. Open your Grafana instance and navigate to the Explore view.
Logs in Loki
Select the Loki data source in Explore. The demo services emit structured JSON logs with OpenTelemetry context (trace IDs, span IDs). Try these LogQL queries:
# View all logs from the checkout service
{service_name="checkoutservice"}
# Filter for errors across all services
{deployment_environment="learning-lab"} |= "error" | json
# Find logs correlated with slow traces
{service_name="cartservice"} | json | duration > 500ms
# Parse structured fields and filter
{service_name="paymentservice"} | json | line_format "{{.severity}} - {{.body}}"
# Count errors by service over time
sum by (service_name) (count_over_time({deployment_environment="learning-lab"} |= "ERROR" [5m]))
Task: Find all checkout failures in the last hour and identify which downstream service caused the failure.
- Open Explore with Loki data source
- Query:
{service_name="checkoutservice"} |= "failed" | json - Expand a log line and find the
trace_idfield - Click the trace ID to jump to the correlated trace in Tempo
Metrics in Mimir
Switch to the Prometheus data source (backed by Mimir). The demo generates hundreds of metrics including HTTP request durations, gRPC call counts, runtime metrics, and custom business metrics.
# HTTP request duration histogram (95th percentile)
histogram_quantile(0.95,
sum by (le, service_name) (
rate(http_server_request_duration_seconds_bucket{deployment_environment="learning-lab"}[5m])
)
)
# Request rate by service
sum by (service_name) (
rate(http_server_request_duration_seconds_count{deployment_environment="learning-lab"}[5m])
)
# Error rate (5xx responses)
sum by (service_name) (
rate(http_server_request_duration_seconds_count{
deployment_environment="learning-lab",
http_response_status_code=~"5.."
}[5m])
)
# gRPC call duration by method
histogram_quantile(0.99,
sum by (le, rpc_method) (
rate(rpc_server_duration_milliseconds_bucket{deployment_environment="learning-lab"}[5m])
)
)
Traces in Tempo
Switch to the Tempo data source. The demo generates distributed traces that span multiple services as requests flow through the e-commerce system. Use TraceQL to search for traces:
# Find slow checkout traces (> 2 seconds)
{span.service.name = "checkoutservice" && duration > 2s}
# Find traces with errors
{status = error && resource.deployment.environment = "learning-lab"}
# Find traces for a specific HTTP endpoint
{span.http.route = "/api/cart" && span.http.method = "POST"}
# Find traces spanning multiple services
{span.service.name = "frontend"} >> {span.service.name = "cartservice"}
# Aggregate: p95 latency by service
{resource.deployment.environment = "learning-lab"} | avg(duration) by (resource.service.name)
Adding Your Own Applications
The demo environment isn't just for the pre-built services — you can deploy your own applications alongside it and have them report telemetry through the same collector to Grafana Cloud.
Instrumenting a Custom App
Here's a minimal Node.js Express application instrumented with OpenTelemetry. It demonstrates automatic HTTP instrumentation plus custom spans and metrics:
// app.js — Minimal instrumented Express app
// Run: npm init -y && npm install express @opentelemetry/sdk-node \
// @opentelemetry/auto-instrumentations-node @opentelemetry/exporter-trace-otlp-grpc \
// @opentelemetry/exporter-metrics-otlp-grpc @opentelemetry/exporter-logs-otlp-grpc
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');
const { OTLPMetricExporter } = require('@opentelemetry/exporter-metrics-otlp-grpc');
const { PeriodicExportingMetricReader } = require('@opentelemetry/sdk-metrics');
// Initialize OpenTelemetry SDK — MUST be done before importing other modules
const sdk = new NodeSDK({
serviceName: 'my-custom-service',
traceExporter: new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://otel-demo-otelcol:4317'
}),
metricReader: new PeriodicExportingMetricReader({
exporter: new OTLPMetricExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://otel-demo-otelcol:4317'
}),
exportIntervalMillis: 30000
}),
instrumentations: [getNodeAutoInstrumentations()]
});
sdk.start();
// Now import Express (after SDK initialization)
const express = require('express');
const app = express();
app.get('/hello', (req, res) => {
res.json({ message: 'Hello from my custom service!', timestamp: new Date().toISOString() });
});
app.get('/health', (req, res) => {
res.status(200).json({ status: 'healthy' });
});
const PORT = process.env.PORT || 3000;
app.listen(PORT, () => {
console.log(`Custom service listening on port ${PORT}`);
});
Deploying Alongside the Demo
Create a Kubernetes deployment for your custom application in the same namespace, pointing its OTLP endpoint at the demo's collector:
# my-custom-app.yaml — Deploy alongside the OTel Demo
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-custom-service
namespace: otel-demo
labels:
app: my-custom-service
spec:
replicas: 1
selector:
matchLabels:
app: my-custom-service
template:
metadata:
labels:
app: my-custom-service
spec:
containers:
- name: app
image: my-custom-service:latest
ports:
- containerPort: 3000
env:
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://otel-demo-otelcol:4317"
- name: OTEL_SERVICE_NAME
value: "my-custom-service"
- name: OTEL_RESOURCE_ATTRIBUTES
value: "deployment.environment=learning-lab,service.namespace=custom"
resources:
limits:
memory: 256Mi
cpu: 200m
requests:
memory: 128Mi
cpu: 100m
---
apiVersion: v1
kind: Service
metadata:
name: my-custom-service
namespace: otel-demo
spec:
selector:
app: my-custom-service
ports:
- port: 3000
targetPort: 3000
# Build and load the image into kind
docker build -t my-custom-service:latest .
kind load docker-image my-custom-service:latest --name observability-lab
# Deploy
kubectl apply -f my-custom-app.yaml
# Verify it's running
kubectl get pods -n otel-demo -l app=my-custom-service
# NAME READY STATUS RESTARTS AGE
# my-custom-service-xxxxx 1/1 Running 0 30s
# Generate some traffic
kubectl run curl-test --rm -it --image=curlimages/curl --restart=Never -n otel-demo -- \
sh -c "for i in \$(seq 1 50); do curl -s http://my-custom-service:3000/hello; sleep 0.5; done"
Within seconds, your custom service's telemetry appears in Grafana alongside the demo data:
# In Grafana Explore (Loki):
{service_name="my-custom-service"}
# In Grafana Explore (Prometheus/Mimir):
rate(http_server_request_duration_seconds_count{service_name="my-custom-service"}[5m])
# In Grafana Explore (Tempo):
{resource.service.name = "my-custom-service"}
Troubleshooting
When telemetry isn't appearing in Grafana Cloud, systematically work through these debugging steps. Most issues come down to credentials, network connectivity, or collector configuration errors.
Checking Credentials
# Verify your credentials are set correctly
echo "Instance ID: ${GRAFANA_CLOUD_INSTANCE_ID}"
echo "OTLP URL: ${GRAFANA_CLOUD_OTLP_URL}"
echo "API Key (first 20 chars): ${GRAFANA_CLOUD_API_KEY:0:20}..."
# Test the OTLP endpoint directly with curl
curl -v -X POST "${GRAFANA_CLOUD_OTLP_URL}/v1/traces" \
-H "Authorization: Basic $(echo -n "${GRAFANA_CLOUD_INSTANCE_ID}:${GRAFANA_CLOUD_API_KEY}" | base64)" \
-H "Content-Type: application/json" \
-d '{}'
# Expected: HTTP 200 or 400 (bad request but auth succeeded)
# If 401: credentials are wrong
# If connection refused: URL is wrong
# Verify the Kubernetes secret matches your env vars
kubectl get secret grafana-cloud-credentials -n otel-demo -o jsonpath='{.data.instance-id}' | base64 -d
kubectl get secret grafana-cloud-credentials -n otel-demo -o jsonpath='{.data.api-key}' | base64 -d | head -c 20
Reading Collector Logs
# View recent collector logs
kubectl logs -n otel-demo deployment/otel-demo-otelcol --tail=50
# Filter for errors
kubectl logs -n otel-demo deployment/otel-demo-otelcol --tail=100 | grep -i "error\|failed\|denied"
# Watch logs in real time
kubectl logs -n otel-demo deployment/otel-demo-otelcol -f | grep -v "debug"
# Common error messages and their causes:
# "401 Unauthorized" → Wrong API key or instance ID
# "connection refused" → Wrong endpoint URL or network issue
# "context deadline exceeded" → Timeout reaching Grafana Cloud (DNS or firewall)
# "dropping data" → Memory limiter triggered (increase limits or reduce volume)
# "queue full" → Exporter can't keep up (check network, increase batch size)
Debugging the Collector
The collector exposes health check and debug endpoints that help diagnose pipeline issues:
# Check collector health
kubectl port-forward -n otel-demo deployment/otel-demo-otelcol 13133:13133 &
curl -s http://localhost:13133/health | jq .
# {"status":"Server available","upSince":"2026-06-15T10:00:00Z","uptime":"2h30m"}
# Access zPages for pipeline debugging
kubectl port-forward -n otel-demo deployment/otel-demo-otelcol 55679:55679 &
# Open http://localhost:55679/debug/tracez — shows recent traces through the collector
# Open http://localhost:55679/debug/pipelinez — shows pipeline topology and stats
# Check collector metrics (self-monitoring)
kubectl port-forward -n otel-demo deployment/otel-demo-otelcol 8888:8888 &
curl -s http://localhost:8888/metrics | grep otelcol_exporter
# Key metrics to check:
# otelcol_exporter_sent_spans — successfully exported traces
# otelcol_exporter_send_failed_spans — failed trace exports
# otelcol_exporter_sent_metric_points — successfully exported metrics
# otelcol_receiver_accepted_spans — traces received by the collector
# otelcol_processor_dropped_spans — traces dropped (memory limiter)
# If no data is being received, check the application pods:
kubectl logs -n otel-demo deployment/otel-demo-frontend --tail=20 | grep -i "otel\|export\|telemetry"
# Restart the collector after config changes
kubectl rollout restart deployment/otel-demo-otelcol -n otel-demo
kubectl rollout status deployment/otel-demo-otelcol -n otel-demo
- Are pods running? —
kubectl get pods -n otel-demo - Is the collector healthy? — Check
/healthendpoint (port 13133) - Are credentials correct? — Test with
curlagainst OTLP endpoint - Is data reaching the collector? — Check
otelcol_receiver_accepted_*metrics - Is data being exported? — Check
otelcol_exporter_sent_*metrics - Are there export errors? — Check
otelcol_exporter_send_failed_*metrics - Is the debug exporter showing data? — Check collector stdout logs
- DNS resolution working? —
kubectl exec -it [collector-pod] -- nslookup otlp-gateway-prod-xx-xxx.grafana.net
Summary & Next Steps
You now have a fully functional observability learning environment with:
- Grafana Cloud — hosted dashboards, alerting, and query interfaces for all three signal types
- Local Kubernetes cluster — a lightweight kind/k3d/minikube cluster running on your machine
- OpenTelemetry Demo — ~15 microservices generating realistic e-commerce telemetry in multiple languages
- OpenTelemetry Collector — receiving, processing, and exporting all signals to Grafana Cloud
- Your own applications — the ability to deploy custom services and see their telemetry alongside the demo
This environment will serve as the foundation for all remaining parts of the Grafana Deep Dive track. In subsequent articles, you'll write LogQL queries against the demo's logs, build PromQL dashboards from its metrics, trace requests across services with TraceQL, and configure alerts based on real application behavior.
kind delete cluster --name observability-lab. To resume, re-run the kind create cluster and helm install commands. Your Grafana Cloud data persists between sessions — only the local cluster state is ephemeral.
Next in the Grafana Track
In Part 4: Looking at Logs with Grafana Loki, we'll dive deep into LogQL — from basic label matchers and line filters to complex aggregations, pattern detection, and building log-based dashboards and alerts using data from our demo environment.