Prometheus Deep Dive Part 2: Deploying Prometheus to Kubernetes

Lab Environment Setup

Before deploying Prometheus, we need a Kubernetes cluster. For this track, we’ll use kind (Kubernetes in Docker) as our primary lab environment — it’s lightweight, fast to create, and closely mirrors production clusters. All examples in Parts 2–12 are tested against this lab setup.

Creating a kind Cluster

Our lab cluster needs multiple nodes to demonstrate real-world scenarios like node-level metrics, anti-affinity, and pod distribution:

# kind-cluster-config.yaml
# A multi-node cluster for Prometheus lab exercises
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: prometheus-lab
nodes:
  - role: control-plane
    kubeadmConfigPatches:
      - |
        kind: InitConfiguration
        nodeRegistration:
          kubeletExtraArgs:
            node-labels: "topology.kubernetes.io/zone=us-east-1a"
    extraPortMappings:
      - containerPort: 30090
        hostPort: 9090
        protocol: TCP
      - containerPort: 30093
        hostPort: 9093
        protocol: TCP
      - containerPort: 30030
        hostPort: 3000
        protocol: TCP
  - role: worker
    kubeadmConfigPatches:
      - |
        kind: JoinConfiguration
        nodeRegistration:
          kubeletExtraArgs:
            node-labels: "topology.kubernetes.io/zone=us-east-1a"
  - role: worker
    kubeadmConfigPatches:
      - |
        kind: JoinConfiguration
        nodeRegistration:
          kubeletExtraArgs:
            node-labels: "topology.kubernetes.io/zone=us-east-1b"
  - role: worker
    kubeadmConfigPatches:
      - |
        kind: JoinConfiguration
        nodeRegistration:
          kubeletExtraArgs:
            node-labels: "topology.kubernetes.io/zone=us-east-1b"

# Create the cluster
kind create cluster --config kind-cluster-config.yaml

# Verify nodes are ready
kubectl get nodes -o wide
# NAME                          STATUS   ROLES           AGE   VERSION
# prometheus-lab-control-plane  Ready    control-plane   45s   v1.30.0
# prometheus-lab-worker         Ready    <none>          30s   v1.30.0
# prometheus-lab-worker2        Ready    <none>          30s   v1.30.0
# prometheus-lab-worker3        Ready    <none>          30s   v1.30.0

# Install metrics-server for resource metrics
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
kubectl patch deployment metrics-server -n kube-system --type='json' \
  -p='[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--kubelet-insecure-tls"}]'

Minikube Alternative

If you prefer minikube or are on a resource-constrained machine:

# Start minikube with sufficient resources for the Prometheus stack
minikube start \
  --cpus=4 \
  --memory=8192 \
  --disk-size=40g \
  --kubernetes-version=v1.30.0 \
  --nodes=3 \
  --driver=docker

# Enable required addons
minikube addons enable metrics-server
minikube addons enable default-storageclass
minikube addons enable storage-provisioner

Prerequisites & Tools

Prerequisites

Required Tools for This Track

Tool	Version	Purpose	Install
kubectl	≥ 1.28	Kubernetes CLI	`brew install kubectl`
helm	≥ 3.14	Helm chart management	`brew install helm`
kind	≥ 0.22	Local K8s clusters	`brew install kind`
jq	≥ 1.7	JSON processing	`brew install jq`
promtool	≥ 2.53	Prometheus rule validation	Bundled with Prometheus binary
Docker	≥ 24.0	Container runtime for kind	`brew install --cask docker`

SetupToolsLab

Understanding the Prometheus Operator

The Operator Pattern

The Prometheus Operator extends Kubernetes with Custom Resource Definitions (CRDs) that let you declare your monitoring configuration as Kubernetes-native YAML. Instead of manually editing prometheus.yml, you create Kubernetes resources that the operator watches and automatically translates into Prometheus configuration.

Prometheus Operator Reconciliation Flow

flowchart TD
    subgraph CRDs["Custom Resources (You Write)"]
        SM["ServiceMonitor"]
        PM["PodMonitor"]
        PR["PrometheusRule"]
        P["Prometheus"]
        AM["AlertmanagerConfig"]
    end

    subgraph Operator["Prometheus Operator (Watches)"]
        RC["Reconciliation Controller"]
    end

    subgraph Generated["Generated Artifacts"]
        CFG["prometheus.yml
(scrape configs)"]
        RULES["rule_files/
(alert + recording rules)"]
        SEC["secrets/
(TLS certs, bearer tokens)"]
    end

    subgraph Running["Running Workloads"]
        PROM["Prometheus StatefulSet"]
        AMGR["Alertmanager StatefulSet"]
    end

    SM --> RC
    PM --> RC
    PR --> RC
    P --> RC
    AM --> RC
    RC --> CFG
    RC --> RULES
    RC --> SEC
    CFG --> PROM
    RULES --> PROM
    SEC --> PROM
    AM --> AMGR

Custom Resource Definitions

The Prometheus Operator introduces these CRDs to your cluster:

CRD	Purpose	Generates
Prometheus	Defines a Prometheus server instance	StatefulSet + ConfigMap + Service
ServiceMonitor	Declares scrape targets via Kubernetes Services	`scrape_configs` entries
PodMonitor	Declares scrape targets directly from Pods	`scrape_configs` entries
PrometheusRule	Recording rules and alerting rules	Rule files mounted into Prometheus
Alertmanager	Defines an Alertmanager instance	StatefulSet + ConfigMap
AlertmanagerConfig	Per-namespace alert routing config	Alertmanager configuration sections
ScrapeConfig	Generic scrape targets (static, DNS, HTTP SD)	`scrape_configs` entries
PrometheusAgent	Prometheus in agent mode (remote-write only)	StatefulSet in agent mode

Reconciliation Loop

                            
                            How the Operator Works: The operator runs a continuous reconciliation loop. When you create, update, or delete a ServiceMonitor, the operator detects the change, regenerates the Prometheus configuration, and triggers a configuration reload via Prometheus’ /-/reload HTTP endpoint — all without restarting the Prometheus pod. Configuration reload typically takes 1–3 seconds.
                        

Deploying with Helm

Key Helm Values

The kube-prometheus-stack chart bundles the Prometheus Operator, Prometheus, Alertmanager, Grafana, Node Exporter, and kube-state-metrics into a single deployment. Here’s a production-oriented values file:

# values-prometheus-lab.yaml
# Production-oriented values for kube-prometheus-stack
# Adjust resource limits based on your cluster size

prometheus:
  prometheusSpec:
    # Retention and storage
    retention: 15d
    retentionSize: "45GB"

    # Resource limits - sized for ~50,000 active series
    resources:
      requests:
        cpu: 500m
        memory: 2Gi
      limits:
        cpu: 2000m
        memory: 4Gi

    # Persistent storage
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: standard
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi

    # ServiceMonitor selector - watch all namespaces
    serviceMonitorSelectorNilUsesHelmValues: false
    podMonitorSelectorNilUsesHelmValues: false
    ruleSelectorNilUsesHelmValues: false

    # Scrape interval and timeout
    scrapeInterval: "30s"
    scrapeTimeout: "10s"
    evaluationInterval: "30s"

    # Enable remote write for future long-term storage
    # remoteWrite:
    #   - url: "http://mimir-distributor:8080/api/v1/push"

    # Additional scrape configs (for targets without ServiceMonitors)
    additionalScrapeConfigs: []

  # Expose via NodePort for lab access
  service:
    type: NodePort
    nodePort: 30090

alertmanager:
  alertmanagerSpec:
    resources:
      requests:
        cpu: 100m
        memory: 256Mi
      limits:
        cpu: 500m
        memory: 512Mi
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: standard
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 5Gi
  service:
    type: NodePort
    nodePort: 30093

grafana:
  adminPassword: "prom-lab-2026"
  persistence:
    enabled: true
    size: 5Gi
  service:
    type: NodePort
    nodePort: 30030
  sidecar:
    dashboards:
      enabled: true
      searchNamespace: ALL
    datasources:
      enabled: true

# Node Exporter - deploy on all nodes
nodeExporter:
  enabled: true

# kube-state-metrics - Kubernetes object metrics
kubeStateMetrics:
  enabled: true

# Component scraping configuration
kubeApiServer:
  enabled: true
kubeControllerManager:
  enabled: true
kubeScheduler:
  enabled: true
kubeEtcd:
  enabled: true
kubelet:
  enabled: true
  serviceMonitor:
    metricRelabelings:
      # Drop high-cardinality kubelet metrics in lab environment
      - sourceLabels: [__name__]
        regex: 'kubelet_runtime_operations_duration_seconds_bucket'
        action: drop

Installation Steps

# Add the Prometheus community Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Create a dedicated monitoring namespace
kubectl create namespace monitoring

# Install kube-prometheus-stack with our custom values
helm install prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --values values-prometheus-lab.yaml \
  --version 62.3.0 \
  --wait --timeout 10m

# Verify all pods are running
kubectl get pods -n monitoring -o wide
# NAME                                                     READY   STATUS    RESTARTS
# alertmanager-prometheus-stack-kube-prom-alertmanager-0    2/2     Running   0
# prometheus-prometheus-stack-kube-prom-prometheus-0        2/2     Running   0
# prometheus-stack-grafana-7d9f4c8b4-x2k9m                 3/3     Running   0
# prometheus-stack-kube-prom-operator-5c8b9f6d4-p7h2n      1/1     Running   0
# prometheus-stack-kube-state-metrics-6c8b5d6f4-r9t3k      1/1     Running   0
# prometheus-stack-prometheus-node-exporter-abcde           1/1     Running   0
# prometheus-stack-prometheus-node-exporter-fghij           1/1     Running   0
# prometheus-stack-prometheus-node-exporter-klmno           1/1     Running   0

What Gets Deployed

kube-prometheus-stack Components

flowchart TD
    subgraph Helm["kube-prometheus-stack Chart"]
        subgraph Core["Core Components"]
            OP["Prometheus Operator
(Deployment)"]
            PROM["Prometheus Server
(StatefulSet, 1 replica)"]
            AM["Alertmanager
(StatefulSet, 1 replica)"]
        end
        subgraph Collectors["Metric Collectors"]
            NE["Node Exporter
(DaemonSet, all nodes)"]
            KSM["kube-state-metrics
(Deployment, 1 replica)"]
        end
        subgraph Viz["Visualization"]
            GF["Grafana
(Deployment + dashboards)"]
        end
        subgraph CRDS["Pre-configured CRDs"]
            SM1["ServiceMonitors
(API server, kubelet,
etcd, node-exporter, KSM)"]
            PR1["PrometheusRules
(K8s alerts, node alerts,
Prometheus self-monitoring)"]
        end
    end

    OP -->|manages| PROM
    OP -->|manages| AM
    SM1 -->|discovered by| PROM
    PR1 -->|loaded by| PROM
    NE -->|scraped by| PROM
    KSM -->|scraped by| PROM
    PROM -->|data source| GF
    PROM -->|fires alerts| AM

ServiceMonitor & PodMonitor CRDs

ServiceMonitor Specification

A ServiceMonitor tells Prometheus to scrape pods behind a Kubernetes Service. It’s the most common way to add custom scrape targets:

# servicemonitor-example.yaml
# Monitor a custom application exposing metrics on port 8080
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app-metrics
  namespace: monitoring
  labels:
    # These labels must match the Prometheus serviceMonitorSelector
    release: prometheus-stack
spec:
  # Which namespaces to find the target Services in
  namespaceSelector:
    matchNames:
      - production
      - staging

  # Select Services with these labels
  selector:
    matchLabels:
      app.kubernetes.io/name: my-app
      metrics: "enabled"

  # Endpoint configuration (how to scrape)
  endpoints:
    - port: metrics          # Named port on the Service
      path: /metrics         # Metrics endpoint path
      interval: 30s          # Override global scrape interval
      scrapeTimeout: 10s     # Timeout per scrape
      scheme: https          # Use HTTPS
      tlsConfig:
        insecureSkipVerify: false
        caFile: /etc/prometheus/certs/ca.crt
      bearerTokenSecret:
        name: prometheus-token
        key: token
      # Relabeling - add custom labels to all metrics from this target
      relabelings:
        - sourceLabels: [__meta_kubernetes_namespace]
          targetLabel: namespace
        - sourceLabels: [__meta_kubernetes_pod_name]
          targetLabel: pod
        - sourceLabels: [__meta_kubernetes_pod_label_version]
          targetLabel: app_version
      # Metric relabeling - filter or modify metrics after scraping
      metricRelabelings:
        - sourceLabels: [__name__]
          regex: 'go_gc_.*'
          action: drop

PodMonitor Specification

Use PodMonitor when your pods don’t have a Service in front of them (sidecar containers, batch jobs, DaemonSets with host-network):

# podmonitor-example.yaml
# Monitor pods directly without a Service
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: envoy-sidecar-metrics
  namespace: monitoring
  labels:
    release: prometheus-stack
spec:
  namespaceSelector:
    matchNames:
      - production

  # Select pods directly by label
  selector:
    matchLabels:
      sidecar.istio.io/inject: "true"

  podMetricsEndpoints:
    - port: http-envoy-prom   # Named port on the Pod spec
      path: /stats/prometheus
      interval: 15s
      relabelings:
        - sourceLabels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
          regex: "true"
          action: keep

Namespace Selection Strategies

                            
                            Common Pitfall: By default, the Prometheus Operator only discovers ServiceMonitors in its own namespace. Set serviceMonitorSelectorNilUsesHelmValues: false in the Helm values to discover ServiceMonitors across all namespaces. Without this, newly created ServiceMonitors in application namespaces will be silently ignored.
                        

# Three namespace selection strategies:

# Strategy 1: Monitor ALL namespaces (recommended for most clusters)
spec:
  namespaceSelector:
    any: true

# Strategy 2: Specific namespaces (multi-tenant isolation)
spec:
  namespaceSelector:
    matchNames:
      - team-a-production
      - team-a-staging

# Strategy 3: Namespace labels (dynamic, scales with new namespaces)
spec:
  namespaceSelector:
    matchLabels:
      monitoring: enabled
# Then label your namespaces:
# kubectl label namespace production monitoring=enabled

Storage & Resource Sizing

Persistent Volume Configuration

Without persistent storage, Prometheus loses all data when the pod restarts. For any environment beyond throwaway testing, configure a PersistentVolumeClaim:

# The storageSpec in the Prometheus CRD
prometheus:
  prometheusSpec:
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: gp3    # AWS EBS gp3 for production
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 100Gi
        # Optional: node affinity for storage locality
        selector:
          matchLabels:
            prometheus-storage: "true"

Resource Calculation Formula

                            
                            Prometheus Resource Sizing Formulas:
                            Memory: ~3KB per active time series × 2 (headroom) = active_series × 6KB
Disk (per day): ~1.5 bytes per sample × samples/day = series × (86400 / scrape_interval) × 1.5B
Disk (total): daily_bytes × retention_days × 1.2 (compaction overhead)
CPU: Driven by scrape frequency, PromQL query complexity, and rule evaluation

                        

Sizing Guidelines by Scale

Reference

Resource Sizing Guidelines

Scale	Active Series	Memory	CPU	Disk (15d)	Scrape Interval
Small (dev/staging)	< 50K	2–4 GB	0.5–1 core	20–50 GB	30s
Medium (single team)	50K–500K	4–16 GB	1–4 cores	50–200 GB	15–30s
Large (platform)	500K–5M	16–64 GB	4–16 cores	200 GB–1 TB	15s
XL (requires sharding)	> 5M	64+ GB per shard	16+ cores	1+ TB per shard	15s

SizingCapacity PlanningResources

RBAC Configuration

Prometheus RBAC Requirements

Prometheus needs specific RBAC permissions to discover and scrape targets via the Kubernetes API. The Helm chart creates these automatically, but understanding them is critical for troubleshooting:

# ClusterRole created by kube-prometheus-stack
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus-stack-kube-prom-prometheus
rules:
  # Service discovery - find endpoints to scrape
  - apiGroups: [""]
    resources: ["nodes", "nodes/metrics", "services", "endpoints", "pods"]
    verbs: ["get", "list", "watch"]
  # Read configmaps for service discovery
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["get"]
  # Access to networking resources for ingress SD
  - apiGroups: ["networking.k8s.io"]
    resources: ["ingresses"]
    verbs: ["get", "list", "watch"]
  # Non-resource URLs (kubelet /metrics, API server /metrics)
  - nonResourceURLs: ["/metrics", "/metrics/cadvisor"]
    verbs: ["get"]

Cross-Namespace Scraping

If Prometheus runs in the monitoring namespace but needs to scrape pods in production, it needs a ClusterRoleBinding (not a namespace-scoped RoleBinding):

# Verify the ClusterRoleBinding exists
kubectl get clusterrolebinding | grep prometheus

# If scrape targets show "403 Forbidden" in Prometheus targets page:
kubectl auth can-i get pods --as=system:serviceaccount:monitoring:prometheus-stack-kube-prom-prometheus -n production
# Should return: yes

Accessing & Verifying

Port Forwarding

# Port-forward Prometheus UI
kubectl port-forward -n monitoring svc/prometheus-stack-kube-prom-prometheus 9090:9090 &

# Port-forward Alertmanager UI
kubectl port-forward -n monitoring svc/prometheus-stack-kube-prom-alertmanager 9093:9093 &

# Port-forward Grafana
kubectl port-forward -n monitoring svc/prometheus-stack-grafana 3000:80 &

# Access in browser:
# Prometheus: http://localhost:9090
# Alertmanager: http://localhost:9093
# Grafana: http://localhost:3000 (admin / prom-lab-2026)

Verifying Scrape Targets

Navigate to Status → Targets in the Prometheus UI. You should see all configured targets with their state:

Expected Target States After Deployment

flowchart LR
    subgraph Healthy["UP (Healthy Targets)"]
        N["node-exporter
4/4 targets"]
        K["kubelet
4/4 targets"]
        KSM["kube-state-metrics
1/1 targets"]
        API["apiserver
1/1 targets"]
        AM["alertmanager
1/1 targets"]
        P["prometheus
1/1 targets"]
    end

    subgraph Issues["Common Issues"]
        ETD["etcd
0/1 DOWN"]
        SCH["scheduler
0/1 DOWN"]
        CM["controller-mgr
0/1 DOWN"]
    end

    Issues -.->|"kind/minikube:
bind to 127.0.0.1"| FIX["Fix: expose on
0.0.0.0 or skip"]

                            
                            Expected in kind/minikube: The etcd, scheduler, and controller-manager targets may show as DOWN because they bind to 127.0.0.1 by default in local clusters. This is normal for lab environments. In production clusters with proper network configuration, these targets will be UP.
                        

Your First PromQL Query

# Verify Prometheus is collecting data - run these in the Prometheus UI Expression Browser

# Count total active time series
prometheus_tsdb_head_series

# List all scrape jobs
count by (job) (up)

# Check scrape durations
scrape_duration_seconds{job="node-exporter"}

# Node memory usage (from node-exporter)
node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100

# Container CPU usage (from kubelet/cAdvisor)
rate(container_cpu_usage_seconds_total{container!=""}[5m])

If these queries return data, your Prometheus deployment is working correctly. We’ll explore PromQL in much greater depth in Part 4.

Conclusion & What’s Next

You now have a fully operational Prometheus stack running in Kubernetes with:

The Prometheus Operator managing configuration as Kubernetes-native CRDs
Persistent storage protecting metrics data across pod restarts
ServiceMonitors auto-discovering workloads across all namespaces
Grafana pre-configured with Kubernetes dashboards
Node Exporter and kube-state-metrics providing infrastructure and object-level metrics

This lab environment will be our foundation for the rest of the Prometheus deep dive track. Keep it running — we’ll add to it incrementally.

Next in the Series

In Part 3: The Prometheus Data Model & TSDB, we’ll open the hood on the Prometheus Time Series Database — understanding the WAL, head blocks, chunk encoding, compaction, and the index structure that makes PromQL queries fast.

Previous Part 1: Observability, Monitoring & Prometheus Next Part 3: Data Model & TSDB