Back to Systems Thinking & Architecture Mastery Series

Service Mesh Architecture — Istio & Envoy

May 15, 2026 Wasil Zafar 26 min read

"Service mesh control planes configure proxies. The proxies ARE the data plane." — Understanding this separation unlocks traffic management, security, and observability without changing a single line of application code.

Table of Contents

  1. Service Mesh Concept
  2. Control Plane — Istiod
  3. Data Plane — Envoy Proxies
  4. xDS Protocol Family
  5. Sidecar vs Ambient Mesh
  6. Traffic Management
  7. Security — mTLS & Authorization
  8. Observability

Service Mesh Concept

A service mesh is a dedicated infrastructure layer for handling service-to-service communication. It makes communication reliable, secure, and observable without requiring changes to application code. The mesh intercepts all network traffic between services and applies policies transparently.

The architecture follows the control/data plane pattern precisely:

  • Control plane — centralized management (Istiod) that decides what policies to apply
  • Data plane — distributed proxies (Envoy sidecars) that execute those policies on every request
Istio Service Mesh Architecture
flowchart TD
    subgraph "Control Plane"
        ISTIOD["Istiod\n(Pilot + Citadel + Galley)"]
    end

    subgraph "Data Plane — Pod A"
        APP_A["App Container\n(Service A)"] <-->|"localhost"| ENVOY_A["Envoy Proxy\n(Sidecar)"]
    end

    subgraph "Data Plane — Pod B"
        APP_B["App Container\n(Service B)"] <-->|"localhost"| ENVOY_B["Envoy Proxy\n(Sidecar)"]
    end

    subgraph "Data Plane — Pod C"
        APP_C["App Container\n(Service C)"] <-->|"localhost"| ENVOY_C["Envoy Proxy\n(Sidecar)"]
    end

    ISTIOD -->|"xDS config push"| ENVOY_A
    ISTIOD -->|"xDS config push"| ENVOY_B
    ISTIOD -->|"xDS config push"| ENVOY_C

    ENVOY_A <-->|"mTLS"| ENVOY_B
    ENVOY_B <-->|"mTLS"| ENVOY_C

    style ISTIOD fill:#BF092F,color:#fff
    style ENVOY_A fill:#3B9797,color:#fff
    style ENVOY_B fill:#3B9797,color:#fff
    style ENVOY_C fill:#3B9797,color:#fff
                            
Key Insight: Applications communicate via localhost — they don't know the mesh exists. The Envoy sidecar transparently intercepts all inbound and outbound traffic via iptables rules injected during pod startup. This is why service mesh is "infrastructure" — it's invisible to the application layer.

Control Plane — Istiod

Istiod is the unified control plane binary that combines three formerly separate components: Pilot, Citadel, and Galley. It's the brain that translates high-level intent (VirtualService, DestinationRule) into low-level Envoy configuration.

Pilot — Traffic Rules to Envoy Config

Pilot watches Kubernetes resources (Services, Endpoints, VirtualServices) and translates them into Envoy-native configuration distributed via the xDS API. When you create a VirtualService splitting traffic 90/10, Pilot computes the route configuration and pushes it to every relevant Envoy proxy.

Citadel — Certificate Authority & mTLS

Citadel acts as the mesh's internal Certificate Authority. It issues SPIFFE-based identity certificates to every workload, enabling mutual TLS between services without application changes. Certificates are rotated automatically (default 24-hour lifetime).

mTLS Certificate Rotation Flow
sequenceDiagram
    participant Envoy as Envoy Proxy
    participant Agent as Istio Agent (pilot-agent)
    participant Istiod as Istiod (CA)

    Note over Agent: Certificate approaching expiry
    Agent->>Istiod: CSR (Certificate Signing Request)
    Note over Istiod: Validate pod identity via K8s TokenReview
    Istiod->>Agent: Signed certificate (SPIFFE ID)
    Agent->>Envoy: Hot-reload new cert via SDS
    Note over Envoy: Zero-downtime rotation
    Envoy->>Envoy: Use new cert for all new connections
    Note over Envoy: Old connections drain gracefully
                            

Galley — Configuration Validation

Galley validates Istio configuration before it reaches Pilot. It acts as a webhook admission controller, rejecting invalid VirtualServices or DestinationRules before they can cause routing failures in the data plane.

# Validate Istio configuration before applying
istioctl analyze

# Example output for common misconfigurations
# Warning [IST0101] (VirtualService default/reviews)
#   Referenced host not found: "reviews.default.svc.cluster.local"
#
# Error [IST0106] (DestinationRule default/reviews)
#   Referenced subset "v3" not found in any matching DestinationRule

# Check what Pilot has pushed to a specific Envoy
istioctl proxy-config routes deploy/productpage -o json

# View listener configuration on a specific pod's sidecar
istioctl proxy-config listeners deploy/productpage

# Check certificate status
istioctl proxy-config secret deploy/productpage

Data Plane — Envoy Proxies

Envoy is a high-performance L4/L7 proxy written in C++. In Istio, every pod gets an Envoy sidecar that intercepts all traffic. The sidecar handles:

  • Traffic interception — iptables rules redirect all pod traffic through Envoy
  • Request routing — route selection based on headers, paths, weights
  • Load balancing — round-robin, least connections, random, ring hash
  • Retries & timeouts — automatic retry with exponential backoff
  • Circuit breaking — stop sending traffic to failing upstream services
  • mTLS termination — encrypt/decrypt all inter-service traffic
  • Telemetry generation — metrics, traces, access logs for every request

Sidecar Injection

Istio uses a Kubernetes mutating webhook to automatically inject the Envoy sidecar container into every pod in labeled namespaces. The injection also adds an init container that configures iptables rules to redirect traffic:

# Enable automatic sidecar injection for a namespace
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    istio-injection: enabled    # Webhook injects sidecar into all pods

---
# What the webhook adds to every pod:
# 1. Init container: istio-init (configures iptables)
# 2. Sidecar container: istio-proxy (Envoy)
# 3. Volumes: istio-certs, istio-envoy

# Resulting pod spec (simplified):
apiVersion: v1
kind: Pod
metadata:
  name: reviews-v1
  annotations:
    sidecar.istio.io/status: '{"initContainers":["istio-init"]}'
spec:
  initContainers:
  - name: istio-init
    image: istio/proxyv2:1.20
    command: ['istio-iptables', '-p', '15001', '-z', '15006']
    # Redirects all inbound traffic to port 15006
    # Redirects all outbound traffic to port 15001
  containers:
  - name: reviews              # Application container (unchanged)
    image: reviews:v1
    ports:
    - containerPort: 9080
  - name: istio-proxy          # Envoy sidecar (injected)
    image: istio/proxyv2:1.20
    ports:
    - containerPort: 15090     # Prometheus metrics
    - containerPort: 15021     # Health check

xDS Protocol Family

The xDS (x Discovery Service) protocol is how the control plane communicates configuration to data plane proxies. It's a gRPC-based streaming API that pushes updates in real-time:

xDS Protocol — Control Plane to Data Plane Communication
flowchart LR
    ISTIOD["Istiod\n(Control Plane)"] -->|"CDS\nCluster Discovery"| ENVOY["Envoy Proxy"]
    ISTIOD -->|"EDS\nEndpoint Discovery"| ENVOY
    ISTIOD -->|"LDS\nListener Discovery"| ENVOY
    ISTIOD -->|"RDS\nRoute Discovery"| ENVOY
    ISTIOD -->|"SDS\nSecret Discovery"| ENVOY

    style ISTIOD fill:#BF092F,color:#fff
    style ENVOY fill:#3B9797,color:#fff
                            
  • CDS (Cluster Discovery Service) — upstream service clusters (what services exist)
  • EDS (Endpoint Discovery Service) — individual endpoints per cluster (pod IPs + ports)
  • LDS (Listener Discovery Service) — network listeners and filter chains (what ports to bind)
  • RDS (Route Discovery Service) — routing rules (how to route requests to clusters)
  • SDS (Secret Discovery Service) — TLS certificates and keys (hot-reloadable)
Incremental xDS: Istio uses Delta xDS (incremental updates) rather than State-of-the-World xDS (full config resync). When one endpoint changes, only that delta is pushed — not the entire routing table. In meshes with thousands of services, this reduces control plane CPU by 90%+ and cuts configuration push latency from seconds to milliseconds.

Sidecar vs Ambient Mesh

The sidecar model has drawbacks: resource overhead (each proxy consumes ~50MB RAM + ~50m CPU), application startup dependency, and operational complexity. Istio's ambient mesh mode offers an alternative architecture:

Sidecar Mode vs Ambient Mode
flowchart TD
    subgraph "Sidecar Mode (Traditional)"
        S_APP["App"] <--> S_ENVOY["Envoy Sidecar"]
        S_ENVOY <-->|"mTLS"| S_NET["Network"]
    end

    subgraph "Ambient Mode (Sidecar-free)"
        A_APP["App"] <--> A_ZTUN["ztunnel\n(per-node L4)"]
        A_ZTUN <-->|"mTLS + HBONE"| A_NET["Network"]
        A_NET <--> A_WP["Waypoint Proxy\n(per-namespace L7)"]
    end

    style S_ENVOY fill:#3B9797,color:#fff
    style A_ZTUN fill:#16476A,color:#fff
    style A_WP fill:#BF092F,color:#fff
                            

Ambient mesh splits the data plane into two layers:

  • ztunnel — a shared per-node L4 proxy that handles mTLS, L4 authorization, and telemetry. Zero-copy, minimal overhead.
  • Waypoint proxy — an optional per-namespace L7 proxy (still Envoy) deployed only when L7 features (traffic splitting, header-based routing, retries) are needed.
Architecture Comparison
Sidecar vs Ambient — When to Choose

Choose Sidecar when you need per-pod L7 policy, fine-grained traffic control per workload, or when pods have heterogeneous security requirements. Choose Ambient for large-scale deployments where resource efficiency matters, when L4 security (mTLS + network policy) is sufficient for most services, or when you want mesh benefits without modifying pod specs. Ambient reduces mesh resource consumption by 90%+ for L4-only workloads while maintaining full mTLS coverage.

SidecarAmbientTrade-offs

Traffic Management

Traffic management is the primary use case for service mesh — controlling how requests flow between services. Istio provides two key resources:

VirtualService — Traffic Splitting & Routing

# Canary deployment: 90% to v1, 10% to v2
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: reviews
  namespace: production
spec:
  hosts:
  - reviews                    # Kubernetes service name
  http:
  - match:
    - headers:
        x-canary-user:
          exact: "true"        # Internal testers get v2 always
    route:
    - destination:
        host: reviews
        subset: v2
  - route:                     # Default: traffic split
    - destination:
        host: reviews
        subset: v1
      weight: 90
    - destination:
        host: reviews
        subset: v2
      weight: 10
    retries:
      attempts: 3
      perTryTimeout: 2s
      retryOn: 5xx,reset,connect-failure
    timeout: 10s
    fault:
      delay:                   # Fault injection for chaos testing
        percentage:
          value: 1.0           # 1% of requests get 5s delay
        fixedDelay: 5s

DestinationRule — Load Balancing & Circuit Breaking

# Define subsets and circuit breaker settings
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: reviews
  namespace: production
spec:
  host: reviews
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        h2UpgradePolicy: DEFAULT
        http1MaxPendingRequests: 100
        http2MaxRequests: 1000
    outlierDetection:          # Circuit breaker
      consecutive5xxErrors: 5  # 5 errors triggers ejection
      interval: 10s            # Check every 10s
      baseEjectionTime: 30s    # Eject for 30s minimum
      maxEjectionPercent: 50   # Never eject more than 50%
    loadBalancer:
      simple: LEAST_REQUEST    # Route to least-loaded endpoint
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2

Security — mTLS & Authorization

Service mesh security operates at three levels: transport encryption (mTLS), peer authentication (identity verification), and request authorization (access control).

PeerAuthentication — mTLS Mode

# Enforce strict mTLS for entire mesh
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system      # Mesh-wide policy
spec:
  mtls:
    mode: STRICT               # Reject any non-mTLS traffic
    # Options: DISABLE, PERMISSIVE (accept both), STRICT (mTLS only)

---
# Per-namespace override: permissive for legacy services
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: legacy-namespace
  namespace: legacy
spec:
  mtls:
    mode: PERMISSIVE           # Accept both plaintext and mTLS

AuthorizationPolicy — Fine-Grained Access Control

# Only allow frontend to call reviews service
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: reviews-access
  namespace: production
spec:
  selector:
    matchLabels:
      app: reviews
  action: ALLOW
  rules:
  - from:
    - source:
        principals:
        - "cluster.local/ns/production/sa/frontend"
        # SPIFFE identity: only frontend service account
    to:
    - operation:
        methods: ["GET"]
        paths: ["/api/reviews/*"]
    when:
    - key: request.headers[x-request-id]
      notValues: [""]          # Must have trace correlation header
Zero-Trust Architecture: With mTLS STRICT mode and deny-by-default AuthorizationPolicy, the service mesh implements zero-trust networking. Every request is authenticated (mTLS proves identity), authorized (policy checks SPIFFE ID + request attributes), and encrypted — regardless of network location. No VPNs, no firewall rules, no trust based on IP address.

Observability

Because Envoy proxies handle every request, the mesh automatically generates comprehensive telemetry without any application instrumentation:

  • Metrics — request count, duration, size per source/destination/response code (Prometheus-compatible)
  • Distributed traces — Envoy generates span context and propagates trace headers (Jaeger/Zipkin compatible)
  • Access logs — structured logs for every request with timing, headers, response codes
# View real-time metrics from Envoy sidecars
kubectl exec deploy/productpage -c istio-proxy -- \
  curl -s localhost:15090/stats/prometheus | grep istio_requests_total

# Example output:
# istio_requests_total{
#   reporter="source",
#   source_workload="productpage-v1",
#   destination_workload="reviews-v2",
#   response_code="200",
#   request_protocol="http"
# } 15234

# View Envoy access logs (structured JSON)
kubectl logs deploy/productpage -c istio-proxy | head -1 | jq .
# {
#   "authority": "reviews:9080",
#   "bytes_received": 0,
#   "bytes_sent": 178,
#   "duration": 12,
#   "method": "GET",
#   "path": "/api/reviews/1",
#   "protocol": "HTTP/1.1",
#   "response_code": 200,
#   "upstream_cluster": "outbound|9080|v2|reviews.production.svc.cluster.local",
#   "x_forwarded_for": "10.244.1.5"
# }
Key Takeaway
Service Mesh as the Universal Data Plane

The service mesh exemplifies control/data plane separation at the application networking layer. The control plane (Istiod) holds the policy — what should happen to traffic. The data plane (Envoy proxies) executes — making it happen on every packet. This separation enables powerful abstractions: traffic splitting without code changes, mTLS without application crypto, observability without instrumentation libraries. The application only knows about its business logic; the mesh handles everything else.

ArchitectureSeparation of ConcernsInfrastructure