Service Mesh Concept
A service mesh is a dedicated infrastructure layer for handling service-to-service communication. It makes communication reliable, secure, and observable without requiring changes to application code. The mesh intercepts all network traffic between services and applies policies transparently.
The architecture follows the control/data plane pattern precisely:
- Control plane — centralized management (Istiod) that decides what policies to apply
- Data plane — distributed proxies (Envoy sidecars) that execute those policies on every request
flowchart TD
subgraph "Control Plane"
ISTIOD["Istiod\n(Pilot + Citadel + Galley)"]
end
subgraph "Data Plane — Pod A"
APP_A["App Container\n(Service A)"] <-->|"localhost"| ENVOY_A["Envoy Proxy\n(Sidecar)"]
end
subgraph "Data Plane — Pod B"
APP_B["App Container\n(Service B)"] <-->|"localhost"| ENVOY_B["Envoy Proxy\n(Sidecar)"]
end
subgraph "Data Plane — Pod C"
APP_C["App Container\n(Service C)"] <-->|"localhost"| ENVOY_C["Envoy Proxy\n(Sidecar)"]
end
ISTIOD -->|"xDS config push"| ENVOY_A
ISTIOD -->|"xDS config push"| ENVOY_B
ISTIOD -->|"xDS config push"| ENVOY_C
ENVOY_A <-->|"mTLS"| ENVOY_B
ENVOY_B <-->|"mTLS"| ENVOY_C
style ISTIOD fill:#BF092F,color:#fff
style ENVOY_A fill:#3B9797,color:#fff
style ENVOY_B fill:#3B9797,color:#fff
style ENVOY_C fill:#3B9797,color:#fff
localhost — they don't know the mesh exists. The Envoy sidecar transparently intercepts all inbound and outbound traffic via iptables rules injected during pod startup. This is why service mesh is "infrastructure" — it's invisible to the application layer.
Control Plane — Istiod
Istiod is the unified control plane binary that combines three formerly separate components: Pilot, Citadel, and Galley. It's the brain that translates high-level intent (VirtualService, DestinationRule) into low-level Envoy configuration.
Pilot — Traffic Rules to Envoy Config
Pilot watches Kubernetes resources (Services, Endpoints, VirtualServices) and translates them into Envoy-native configuration distributed via the xDS API. When you create a VirtualService splitting traffic 90/10, Pilot computes the route configuration and pushes it to every relevant Envoy proxy.
Citadel — Certificate Authority & mTLS
Citadel acts as the mesh's internal Certificate Authority. It issues SPIFFE-based identity certificates to every workload, enabling mutual TLS between services without application changes. Certificates are rotated automatically (default 24-hour lifetime).
sequenceDiagram
participant Envoy as Envoy Proxy
participant Agent as Istio Agent (pilot-agent)
participant Istiod as Istiod (CA)
Note over Agent: Certificate approaching expiry
Agent->>Istiod: CSR (Certificate Signing Request)
Note over Istiod: Validate pod identity via K8s TokenReview
Istiod->>Agent: Signed certificate (SPIFFE ID)
Agent->>Envoy: Hot-reload new cert via SDS
Note over Envoy: Zero-downtime rotation
Envoy->>Envoy: Use new cert for all new connections
Note over Envoy: Old connections drain gracefully
Galley — Configuration Validation
Galley validates Istio configuration before it reaches Pilot. It acts as a webhook admission controller, rejecting invalid VirtualServices or DestinationRules before they can cause routing failures in the data plane.
# Validate Istio configuration before applying
istioctl analyze
# Example output for common misconfigurations
# Warning [IST0101] (VirtualService default/reviews)
# Referenced host not found: "reviews.default.svc.cluster.local"
#
# Error [IST0106] (DestinationRule default/reviews)
# Referenced subset "v3" not found in any matching DestinationRule
# Check what Pilot has pushed to a specific Envoy
istioctl proxy-config routes deploy/productpage -o json
# View listener configuration on a specific pod's sidecar
istioctl proxy-config listeners deploy/productpage
# Check certificate status
istioctl proxy-config secret deploy/productpage
Data Plane — Envoy Proxies
Envoy is a high-performance L4/L7 proxy written in C++. In Istio, every pod gets an Envoy sidecar that intercepts all traffic. The sidecar handles:
- Traffic interception — iptables rules redirect all pod traffic through Envoy
- Request routing — route selection based on headers, paths, weights
- Load balancing — round-robin, least connections, random, ring hash
- Retries & timeouts — automatic retry with exponential backoff
- Circuit breaking — stop sending traffic to failing upstream services
- mTLS termination — encrypt/decrypt all inter-service traffic
- Telemetry generation — metrics, traces, access logs for every request
Sidecar Injection
Istio uses a Kubernetes mutating webhook to automatically inject the Envoy sidecar container into every pod in labeled namespaces. The injection also adds an init container that configures iptables rules to redirect traffic:
# Enable automatic sidecar injection for a namespace
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
istio-injection: enabled # Webhook injects sidecar into all pods
---
# What the webhook adds to every pod:
# 1. Init container: istio-init (configures iptables)
# 2. Sidecar container: istio-proxy (Envoy)
# 3. Volumes: istio-certs, istio-envoy
# Resulting pod spec (simplified):
apiVersion: v1
kind: Pod
metadata:
name: reviews-v1
annotations:
sidecar.istio.io/status: '{"initContainers":["istio-init"]}'
spec:
initContainers:
- name: istio-init
image: istio/proxyv2:1.20
command: ['istio-iptables', '-p', '15001', '-z', '15006']
# Redirects all inbound traffic to port 15006
# Redirects all outbound traffic to port 15001
containers:
- name: reviews # Application container (unchanged)
image: reviews:v1
ports:
- containerPort: 9080
- name: istio-proxy # Envoy sidecar (injected)
image: istio/proxyv2:1.20
ports:
- containerPort: 15090 # Prometheus metrics
- containerPort: 15021 # Health check
xDS Protocol Family
The xDS (x Discovery Service) protocol is how the control plane communicates configuration to data plane proxies. It's a gRPC-based streaming API that pushes updates in real-time:
flowchart LR
ISTIOD["Istiod\n(Control Plane)"] -->|"CDS\nCluster Discovery"| ENVOY["Envoy Proxy"]
ISTIOD -->|"EDS\nEndpoint Discovery"| ENVOY
ISTIOD -->|"LDS\nListener Discovery"| ENVOY
ISTIOD -->|"RDS\nRoute Discovery"| ENVOY
ISTIOD -->|"SDS\nSecret Discovery"| ENVOY
style ISTIOD fill:#BF092F,color:#fff
style ENVOY fill:#3B9797,color:#fff
- CDS (Cluster Discovery Service) — upstream service clusters (what services exist)
- EDS (Endpoint Discovery Service) — individual endpoints per cluster (pod IPs + ports)
- LDS (Listener Discovery Service) — network listeners and filter chains (what ports to bind)
- RDS (Route Discovery Service) — routing rules (how to route requests to clusters)
- SDS (Secret Discovery Service) — TLS certificates and keys (hot-reloadable)
Sidecar vs Ambient Mesh
The sidecar model has drawbacks: resource overhead (each proxy consumes ~50MB RAM + ~50m CPU), application startup dependency, and operational complexity. Istio's ambient mesh mode offers an alternative architecture:
flowchart TD
subgraph "Sidecar Mode (Traditional)"
S_APP["App"] <--> S_ENVOY["Envoy Sidecar"]
S_ENVOY <-->|"mTLS"| S_NET["Network"]
end
subgraph "Ambient Mode (Sidecar-free)"
A_APP["App"] <--> A_ZTUN["ztunnel\n(per-node L4)"]
A_ZTUN <-->|"mTLS + HBONE"| A_NET["Network"]
A_NET <--> A_WP["Waypoint Proxy\n(per-namespace L7)"]
end
style S_ENVOY fill:#3B9797,color:#fff
style A_ZTUN fill:#16476A,color:#fff
style A_WP fill:#BF092F,color:#fff
Ambient mesh splits the data plane into two layers:
- ztunnel — a shared per-node L4 proxy that handles mTLS, L4 authorization, and telemetry. Zero-copy, minimal overhead.
- Waypoint proxy — an optional per-namespace L7 proxy (still Envoy) deployed only when L7 features (traffic splitting, header-based routing, retries) are needed.
Sidecar vs Ambient — When to Choose
Choose Sidecar when you need per-pod L7 policy, fine-grained traffic control per workload, or when pods have heterogeneous security requirements. Choose Ambient for large-scale deployments where resource efficiency matters, when L4 security (mTLS + network policy) is sufficient for most services, or when you want mesh benefits without modifying pod specs. Ambient reduces mesh resource consumption by 90%+ for L4-only workloads while maintaining full mTLS coverage.
Traffic Management
Traffic management is the primary use case for service mesh — controlling how requests flow between services. Istio provides two key resources:
VirtualService — Traffic Splitting & Routing
# Canary deployment: 90% to v1, 10% to v2
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: reviews
namespace: production
spec:
hosts:
- reviews # Kubernetes service name
http:
- match:
- headers:
x-canary-user:
exact: "true" # Internal testers get v2 always
route:
- destination:
host: reviews
subset: v2
- route: # Default: traffic split
- destination:
host: reviews
subset: v1
weight: 90
- destination:
host: reviews
subset: v2
weight: 10
retries:
attempts: 3
perTryTimeout: 2s
retryOn: 5xx,reset,connect-failure
timeout: 10s
fault:
delay: # Fault injection for chaos testing
percentage:
value: 1.0 # 1% of requests get 5s delay
fixedDelay: 5s
DestinationRule — Load Balancing & Circuit Breaking
# Define subsets and circuit breaker settings
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: reviews
namespace: production
spec:
host: reviews
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
http:
h2UpgradePolicy: DEFAULT
http1MaxPendingRequests: 100
http2MaxRequests: 1000
outlierDetection: # Circuit breaker
consecutive5xxErrors: 5 # 5 errors triggers ejection
interval: 10s # Check every 10s
baseEjectionTime: 30s # Eject for 30s minimum
maxEjectionPercent: 50 # Never eject more than 50%
loadBalancer:
simple: LEAST_REQUEST # Route to least-loaded endpoint
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
Security — mTLS & Authorization
Service mesh security operates at three levels: transport encryption (mTLS), peer authentication (identity verification), and request authorization (access control).
PeerAuthentication — mTLS Mode
# Enforce strict mTLS for entire mesh
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: istio-system # Mesh-wide policy
spec:
mtls:
mode: STRICT # Reject any non-mTLS traffic
# Options: DISABLE, PERMISSIVE (accept both), STRICT (mTLS only)
---
# Per-namespace override: permissive for legacy services
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: legacy-namespace
namespace: legacy
spec:
mtls:
mode: PERMISSIVE # Accept both plaintext and mTLS
AuthorizationPolicy — Fine-Grained Access Control
# Only allow frontend to call reviews service
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: reviews-access
namespace: production
spec:
selector:
matchLabels:
app: reviews
action: ALLOW
rules:
- from:
- source:
principals:
- "cluster.local/ns/production/sa/frontend"
# SPIFFE identity: only frontend service account
to:
- operation:
methods: ["GET"]
paths: ["/api/reviews/*"]
when:
- key: request.headers[x-request-id]
notValues: [""] # Must have trace correlation header
Observability
Because Envoy proxies handle every request, the mesh automatically generates comprehensive telemetry without any application instrumentation:
- Metrics — request count, duration, size per source/destination/response code (Prometheus-compatible)
- Distributed traces — Envoy generates span context and propagates trace headers (Jaeger/Zipkin compatible)
- Access logs — structured logs for every request with timing, headers, response codes
# View real-time metrics from Envoy sidecars
kubectl exec deploy/productpage -c istio-proxy -- \
curl -s localhost:15090/stats/prometheus | grep istio_requests_total
# Example output:
# istio_requests_total{
# reporter="source",
# source_workload="productpage-v1",
# destination_workload="reviews-v2",
# response_code="200",
# request_protocol="http"
# } 15234
# View Envoy access logs (structured JSON)
kubectl logs deploy/productpage -c istio-proxy | head -1 | jq .
# {
# "authority": "reviews:9080",
# "bytes_received": 0,
# "bytes_sent": 178,
# "duration": 12,
# "method": "GET",
# "path": "/api/reviews/1",
# "protocol": "HTTP/1.1",
# "response_code": 200,
# "upstream_cluster": "outbound|9080|v2|reviews.production.svc.cluster.local",
# "x_forwarded_for": "10.244.1.5"
# }
Service Mesh as the Universal Data Plane
The service mesh exemplifies control/data plane separation at the application networking layer. The control plane (Istiod) holds the policy — what should happen to traffic. The data plane (Envoy proxies) executes — making it happen on every packet. This separation enables powerful abstractions: traffic splitting without code changes, mTLS without application crypto, observability without instrumentation libraries. The application only knows about its business logic; the mesh handles everything else.