Node Exporter Overview
The Prometheus Node Exporter exposes hardware and OS-level metrics from *nix kernels. It reads from /proc, /sys, and other kernel pseudo-filesystems to provide hundreds of metrics covering CPU, memory, disk, network, filesystem, and more.
Architecture & Deployment
# Kubernetes DaemonSet — runs on every node
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitoring
spec:
selector:
matchLabels:
app: node-exporter
template:
metadata:
labels:
app: node-exporter
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '9100'
spec:
hostPID: true
hostNetwork: true # Access host network metrics
containers:
- name: node-exporter
image: prom/node-exporter:v1.8.1
args:
- '--path.rootfs=/host'
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--collector.textfile.directory=/host/var/lib/node_exporter/textfile'
- '--collector.filesystem.mount-points-exclude=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/.+)($|/)'
- '--collector.netclass.ignored-devices=^(veth.*|docker.*|br-.*)$'
- '--collector.systemd'
- '--no-collector.mdadm' # Disable unused collectors
- '--no-collector.infiniband'
ports:
- containerPort: 9100
hostPort: 9100
volumeMounts:
- name: rootfs
mountPath: /host
readOnly: true
mountPropagation: HostToContainer
resources:
limits:
cpu: 250m
memory: 180Mi
requests:
cpu: 100m
memory: 128Mi
volumes:
- name: rootfs
hostPath:
path: /
tolerations:
- effect: NoSchedule
operator: Exists
Enabling/Disabling Collectors
Default Collectors (Enabled)
| Collector | Metrics Prefix | Source |
|---|---|---|
| cpu | node_cpu_* | /proc/stat |
| meminfo | node_memory_* | /proc/meminfo |
| diskstats | node_disk_* | /proc/diskstats |
| filesystem | node_filesystem_* | statfs() |
| netdev | node_network_* | /proc/net/dev |
| loadavg | node_load* | /proc/loadavg |
| textfile | (custom) | *.prom files |
| uname | node_uname_info | uname() |
| time | node_time_* | clock_gettime() |
| conntrack | node_nf_conntrack* | /proc/sys/net/netfilter |
CPU Collector
Key Metrics & Modes
The CPU collector exposes node_cpu_seconds_total — a counter tracking cumulative CPU time spent in each mode per CPU core:
# CPU modes exposed by node_cpu_seconds_total{mode="..."}
# user — Time in user space (applications)
# system — Time in kernel space (syscalls, drivers)
# idle — Idle time (waiting for work)
# iowait — Waiting for I/O completion
# irq — Servicing hardware interrupts
# softirq — Servicing software interrupts
# steal — Time stolen by hypervisor (VMs)
# nice — Low-priority user space processes
# guest — Running virtual CPUs for guests
Essential PromQL Queries
# Overall CPU utilization (all cores, all modes except idle)
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Per-mode breakdown (useful for identifying bottleneck type)
avg by (instance, mode) (rate(node_cpu_seconds_total[5m])) * 100
# CPU saturation — load average vs CPU count
node_load1 / count without (cpu, mode) (node_cpu_seconds_total{mode="idle"})
# iowait specifically (indicates disk bottleneck)
avg by (instance) (rate(node_cpu_seconds_total{mode="iowait"}[5m])) * 100
# Steal time (VM neighbor noise / overcommit)
avg by (instance) (rate(node_cpu_seconds_total{mode="steal"}[5m])) * 100
# Number of CPUs per node
count without (cpu, mode) (node_cpu_seconds_total{mode="idle"})
Alerting Rules
# Production alerting rules for CPU
groups:
- name: node_cpu_alerts
rules:
- alert: HighCpuUsage
expr: |
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[10m])) * 100) > 85
for: 15m
labels:
severity: warning
annotations:
summary: "High CPU on {{ $labels.instance }}"
description: "CPU usage above 85% for 15 minutes (current: {{ $value | printf \"%.1f\" }}%)"
- alert: CpuSaturation
expr: |
node_load15 / count without (cpu, mode) (node_cpu_seconds_total{mode="idle"}) > 2
for: 30m
labels:
severity: critical
annotations:
summary: "CPU saturated on {{ $labels.instance }}"
description: "15-min load average is {{ $value | printf \"%.1f\" }}x the CPU count"
- alert: HighStealTime
expr: |
avg by (instance) (rate(node_cpu_seconds_total{mode="steal"}[5m])) * 100 > 10
for: 10m
labels:
severity: warning
annotations:
summary: "High steal time on {{ $labels.instance }}"
description: "{{ $value | printf \"%.1f\" }}% steal — noisy neighbor or overcommitted host"
Memory Collector
Key Metrics
# Memory metrics from /proc/meminfo
node_memory_MemTotal_bytes # Total physical RAM
node_memory_MemFree_bytes # Completely free (unused)
node_memory_MemAvailable_bytes # Available for allocation (includes reclaimable)
node_memory_Buffers_bytes # Disk buffer cache
node_memory_Cached_bytes # Page cache
node_memory_SwapTotal_bytes # Total swap space
node_memory_SwapFree_bytes # Free swap
node_memory_Slab_bytes # Kernel slab allocator
node_memory_SReclaimable_bytes # Reclaimable slab memory
node_memory_CommitLimit_bytes # Overcommit limit
node_memory_Committed_AS_bytes # Memory committed by all processes
PromQL Patterns
# Actual memory usage (most accurate)
# Uses MemAvailable which accounts for reclaimable cache
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100
# Breakdown: Used / Buffers / Cached / Free
node_memory_MemTotal_bytes
- node_memory_MemFree_bytes
- node_memory_Buffers_bytes
- node_memory_Cached_bytes
- node_memory_SReclaimable_bytes
# Swap usage (any swap usage may indicate memory pressure)
(1 - node_memory_SwapFree_bytes / node_memory_SwapTotal_bytes) * 100
# OOM kill count (if using kernel 4.13+)
rate(node_vmstat_oom_kill[5m])
# Memory pressure — major page faults (require disk I/O)
rate(node_vmstat_pgmajfault[5m])
# Memory alerting rules
groups:
- name: node_memory_alerts
rules:
- alert: HighMemoryUsage
expr: |
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 > 90
for: 10m
labels:
severity: warning
annotations:
summary: "High memory on {{ $labels.instance }}"
description: "Memory usage {{ $value | printf \"%.1f\" }}% — available: {{ with printf \"node_memory_MemAvailable_bytes{instance='%s'}\" $labels.instance | query }}{{ . | first | value | humanize1024 }}{{ end }}"
- alert: SwapUsageHigh
expr: |
(1 - node_memory_SwapFree_bytes / node_memory_SwapTotal_bytes) * 100 > 50
for: 15m
labels:
severity: warning
annotations:
summary: "Swap usage on {{ $labels.instance }}"
description: "{{ $value | printf \"%.0f\" }}% swap in use — memory pressure likely"
Disk & Filesystem Collectors
Disk I/O Metrics
# Disk I/O metrics from /proc/diskstats
node_disk_reads_completed_total # Completed read operations
node_disk_writes_completed_total # Completed write operations
node_disk_read_bytes_total # Bytes read
node_disk_written_bytes_total # Bytes written
node_disk_read_time_seconds_total # Time spent reading
node_disk_write_time_seconds_total # Time spent writing
node_disk_io_time_seconds_total # Time spent doing I/O (utilization)
node_disk_io_time_weighted_seconds_total # Weighted I/O time (queue depth)
# Disk utilization (% of time doing I/O)
rate(node_disk_io_time_seconds_total{device!~"dm-.*"}[5m]) * 100
# Average read/write latency
rate(node_disk_read_time_seconds_total[5m])
/ rate(node_disk_reads_completed_total[5m])
# IOPS
rate(node_disk_reads_completed_total[5m])
+ rate(node_disk_writes_completed_total[5m])
# Throughput (bytes/second)
rate(node_disk_read_bytes_total[5m]) + rate(node_disk_written_bytes_total[5m])
# Average queue depth (saturation indicator)
rate(node_disk_io_time_weighted_seconds_total[5m])
Filesystem Metrics
# Filesystem metrics from statfs()
node_filesystem_size_bytes # Total filesystem size
node_filesystem_avail_bytes # Available space (non-root)
node_filesystem_free_bytes # Free space (includes root reserved)
node_filesystem_files # Total inodes
node_filesystem_files_free # Free inodes
node_filesystem_readonly # Read-only flag
# Filesystem usage percentage
(1 - node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100
# Predict when filesystem will be full (linear extrapolation)
predict_linear(node_filesystem_avail_bytes{fstype!~"tmpfs|overlay"}[6h], 24*3600) < 0
# Inode usage (often overlooked until 100%)
(1 - node_filesystem_files_free / node_filesystem_files) * 100
# Disk & filesystem alerting
groups:
- name: node_disk_alerts
rules:
- alert: DiskWillFillIn24h
expr: |
predict_linear(node_filesystem_avail_bytes{fstype!~"tmpfs|overlay"}[6h], 24*3600) < 0
and node_filesystem_avail_bytes / node_filesystem_size_bytes < 0.2
for: 30m
labels:
severity: warning
annotations:
summary: "Disk filling on {{ $labels.instance }}:{{ $labels.mountpoint }}"
description: "Filesystem {{ $labels.mountpoint }} predicted to fill within 24 hours"
- alert: DiskSpaceCritical
expr: |
(1 - node_filesystem_avail_bytes{fstype!~"tmpfs|overlay"} / node_filesystem_size_bytes) > 0.95
for: 5m
labels:
severity: critical
annotations:
summary: "Disk 95%+ full on {{ $labels.instance }}:{{ $labels.mountpoint }}"
- alert: InodeExhaustion
expr: |
(1 - node_filesystem_files_free / node_filesystem_files) > 0.90
for: 15m
labels:
severity: warning
annotations:
summary: "Inode exhaustion on {{ $labels.instance }}:{{ $labels.mountpoint }}"
Network Collector
Network Device Metrics
# Network interface metrics from /proc/net/dev
node_network_receive_bytes_total # Bytes received
node_network_transmit_bytes_total # Bytes transmitted
node_network_receive_packets_total # Packets received
node_network_transmit_packets_total # Packets transmitted
node_network_receive_errs_total # Receive errors
node_network_transmit_errs_total # Transmit errors
node_network_receive_drop_total # Dropped incoming
node_network_transmit_drop_total # Dropped outgoing
# Bandwidth utilization (bits/sec)
rate(node_network_receive_bytes_total{device!~"lo|veth.*|docker.*"}[5m]) * 8
rate(node_network_transmit_bytes_total{device!~"lo|veth.*|docker.*"}[5m]) * 8
# Packet error rate
rate(node_network_receive_errs_total[5m])
/ rate(node_network_receive_packets_total[5m]) * 100
# Network interface speed and state
node_network_speed_bytes # Negotiated link speed
node_network_up # Interface operational state (1=up)
Conntrack & Sockets
# Connection tracking (critical for firewalls/load balancers)
node_nf_conntrack_entries # Current tracked connections
node_nf_conntrack_entries_limit # Maximum connections allowed
# Conntrack utilization (approaching 100% = dropped connections)
node_nf_conntrack_entries / node_nf_conntrack_entries_limit * 100
# TCP socket state (from /proc/net/sockstat)
node_sockstat_TCP_tw # TIME_WAIT sockets
node_sockstat_TCP_alloc # Allocated sockets
node_sockstat_sockets_used # Total sockets in use
Textfile Collector
The textfile collector reads .prom files from a configured directory, exposing their contents as Prometheus metrics. This is the primary mechanism for exposing custom metrics from cron jobs, scripts, or applications that can’t serve an HTTP endpoint.
Setup & Configuration
# Enable with directory flag
--collector.textfile.directory=/var/lib/node_exporter/textfile
# Create the directory
mkdir -p /var/lib/node_exporter/textfile
# Write metrics in Prometheus exposition format
# File MUST have .prom extension
cat > /var/lib/node_exporter/textfile/backup_status.prom << 'EOF'
# HELP backup_last_success_timestamp_seconds Unix timestamp of last successful backup
# TYPE backup_last_success_timestamp_seconds gauge
backup_last_success_timestamp_seconds{job="database",target="postgres-main"} 1718452800
# HELP backup_size_bytes Size of last backup in bytes
# TYPE backup_size_bytes gauge
backup_size_bytes{job="database",target="postgres-main"} 5368709120
# HELP backup_duration_seconds Duration of last backup
# TYPE backup_duration_seconds gauge
backup_duration_seconds{job="database",target="postgres-main"} 342.5
EOF
Common Patterns
#!/bin/bash
# /etc/cron.d/node-exporter-textfile
# Cron job that writes textfile metrics every 5 minutes
# SSL certificate expiry
CERT_EXPIRY=$(echo | openssl s_client -connect myapp.example.com:443 2>/dev/null | \
openssl x509 -noout -enddate | cut -d= -f2)
CERT_EPOCH=$(date -d "${CERT_EXPIRY}" +%s)
cat > /var/lib/node_exporter/textfile/ssl_expiry.prom << EOF
# HELP ssl_certificate_expiry_seconds Unix timestamp when cert expires
# TYPE ssl_certificate_expiry_seconds gauge
ssl_certificate_expiry_seconds{domain="myapp.example.com"} ${CERT_EPOCH}
EOF
# Package update count
UPDATES=$(apt list --upgradable 2>/dev/null | grep -c upgradable)
cat > /var/lib/node_exporter/textfile/apt_updates.prom << EOF
# HELP node_apt_upgradable_packages Number of packages with available updates
# TYPE node_apt_upgradable_packages gauge
node_apt_upgradable_packages ${UPDATES}
EOF
# Custom application health (from script/API call)
HTTP_CODE=$(curl -s -o /dev/null -w '%{http_code}' http://localhost:8080/health)
cat > /var/lib/node_exporter/textfile/app_health.prom << EOF
# HELP app_health_check_status HTTP status code from health endpoint
# TYPE app_health_check_status gauge
app_health_check_status{app="my-service"} ${HTTP_CODE}
EOF
mv atomically to avoid partial reads. Never use timestamps in textfile metrics (Prometheus adds scrape time). If the file is stale, the metric node_textfile_mtime_seconds will show when it was last modified — alert on staleness rather than checking the metric value.
Advanced Collectors
systemd Collector
# Enable systemd collector
--collector.systemd
--collector.systemd.unit-include="(nginx|postgresql|redis|docker)\.service"
# Metrics exposed:
node_systemd_unit_state{name="nginx.service", state="active"} # 1 if in this state
node_systemd_unit_state{name="nginx.service", state="failed"} # 1 if failed
node_systemd_timer_last_trigger_seconds # Last timer trigger time
# Alert on service failure
node_systemd_unit_state{state="failed"} == 1
Hardware (IPMI, hwmon, thermal)
# Hardware temperature monitoring
node_hwmon_temp_celsius # Temperature sensors
node_thermal_zone_temp # CPU thermal zones
node_cooling_device_cur_state # Cooling device state
# Power supply (laptop/UPS)
node_power_supply_energy_watthour
node_power_supply_online
# IPMI (requires ipmi-tools + root access)
# Enable with: --collector.ipmi
node_ipmi_temperature_celsius{name="CPU1 Temp"}
node_ipmi_fan_speed_rpm{name="FAN1"}
node_ipmi_power_watts{name="System Board"}
Conclusion
- Deploy as DaemonSet with
hostNetwork: trueandhostPID: truefor complete visibility - Filter filesystem mounts — exclude tmpfs, overlay, and container-internal mounts
- Use textfile collector for custom metrics (backup status, cert expiry, package updates)
- Enable systemd collector to monitor critical services
- Disable unused collectors to reduce scrape time and cardinality
- Use recording rules for common dashboard queries (CPU %, memory %, disk predictions)
- Alert on predictions (
predict_linear) not just thresholds for disk and memory