Back to Modern DevOps & Platform Engineering Series

Istio Service Mesh — Complete Tool Reference Guide

May 15, 2026 Wasil Zafar 30 min read

A comprehensive reference to Istio — the CNCF graduated service mesh that adds traffic management, identity-based security, and rich observability to Kubernetes workloads, in both sidecar and ambient modes.

Table of Contents

  1. Overview & History
  2. Architecture
  3. Sidecar vs Ambient Mode
  4. Installation
  5. Traffic Management
  6. mTLS & Identity
  7. Authorization Policies
  8. Ingress & Gateway
  9. Observability
  10. Multi-Cluster
  11. Troubleshooting

Overview & History

Istio is a service mesh — a dedicated infrastructure layer that handles service-to-service communication transparently. It moves cross-cutting networking concerns (TLS, retries, circuit breaking, traffic splitting, authentication, authorisation, telemetry) out of every microservice and into a shared, configurable mesh.

Created by Google, IBM, and Lyft in 2017, Istio combines a control plane (originally istiod) with Envoy proxies as the data plane. It joined CNCF as an incubating project in 2022 and graduated in 2024 — one of the fastest paths to graduation in CNCF history. Today it powers production meshes at eBay, IBM, Salesforce, T-Mobile, and Atlassian, among many others.

Key Insight: A service mesh's job is not "to do networking" — Kubernetes already does that. Its job is to make cross-service policies declarative and uniform. Without a mesh, every microservice ships its own retry logic, mTLS code, and tracing instrumentation, and these inevitably diverge. With a mesh, those policies become YAML you write once and apply consistently across hundreds of services in any language.

Architecture

Istio has two layers: a data plane of proxies that sits in the request path, and a control plane that configures those proxies dynamically.

Istio Architecture (Sidecar Mode)
flowchart TD
    Operator["Operator / Platform Team"] --> CRDs["Istio CRDs
VirtualService · DestinationRule
AuthorizationPolicy · PeerAuthentication"] CRDs --> Istiod["istiod
(Control Plane)"] Istiod -.xDS over gRPC.-> Proxy1["Envoy Sidecar
(Service A pod)"] Istiod -.xDS over gRPC.-> Proxy2["Envoy Sidecar
(Service B pod)"] Istiod -.xDS over gRPC.-> Gateway["istio-ingressgateway
Edge Envoy"] Client["External Client"] --> Gateway Gateway --> Proxy1 Proxy1 -.mTLS.-> Proxy2 Proxy1 --> AppA["App A
(localhost)"] Proxy2 --> AppB["App B
(localhost)"] style Operator fill:#e8f4f4,stroke:#3B9797,color:#132440 style CRDs fill:#e8f4f4,stroke:#3B9797,color:#132440 style Istiod fill:#f0f4f8,stroke:#16476A,color:#132440 style Proxy1 fill:#f0f4f8,stroke:#16476A,color:#132440 style Proxy2 fill:#f0f4f8,stroke:#16476A,color:#132440 style Gateway fill:#f0f4f8,stroke:#16476A,color:#132440 style Client fill:#fff5f5,stroke:#BF092F,color:#132440 style AppA fill:#132440,stroke:#132440,color:#ffffff style AppB fill:#132440,stroke:#132440,color:#ffffff

Sidecar vs Ambient Mode

For its first six years, Istio used the sidecar pattern exclusively — every pod gets an Envoy injected as an additional container that intercepts all traffic via iptables. Effective, but resource-heavy: at one CPU and one memory request per sidecar, large meshes pay a 20-30% infrastructure tax just to run the proxies.

Istio 1.18 (2023) introduced ambient mode — a sidecar-less architecture that splits the mesh into two layers: a per-node ztunnel (zero-trust tunnel) handling L4 + mTLS, and an optional per-namespace waypoint proxy handling L7 features (HTTP routing, AuthorizationPolicy on path/method). Ambient reached GA in Istio 1.22 (2024) and is now the recommended mode for new deployments.

AspectSidecar ModeAmbient Mode
Proxy granularityOne Envoy per podOne ztunnel per node + optional waypoint per namespace
Resource overhead~100m CPU, ~128Mi RAM per pod~100m CPU, ~128Mi RAM per node (shared)
Pod restarts on mesh upgradeYes (sidecar version change)No (only ztunnel restarts)
L7 featuresAlways availableOpt-in via waypoint per namespace
MaturityProduction for 8+ yearsGA since 1.22 (2024)
Best fitExisting meshes; per-pod policy isolationNew meshes; large fleets; FinOps-conscious

Installation

# Download istioctl
curl -L https://istio.io/downloadIstio | ISTIO_VERSION=1.22.3 sh -
cd istio-1.22.3
export PATH=$PWD/bin:$PATH

# Install with the default profile (sidecar mode)
istioctl install --set profile=default -y

# Or install in ambient mode
istioctl install --set profile=ambient --skip-confirmation

# Enable sidecar injection on a namespace (sidecar mode)
kubectl label namespace payments istio-injection=enabled

# Enable ambient mode on a namespace (ambient mode)
kubectl label namespace payments istio.io/dataplane-mode=ambient

# Verify the installation
istioctl verify-install
kubectl get pods -n istio-system

# Install observability addons (Kiali, Prometheus, Grafana, Jaeger)
kubectl apply -f samples/addons

Traffic Management

Traffic management is configured by combining three CRDs: Gateway (which ports the mesh listens on), VirtualService (routing rules — host, path, header, weight), and DestinationRule (per-backend policies — load balancing, circuit breaking, subsets).

# traffic-canary-payments.yaml
# Splits 90% of payments-api traffic to v1, 10% to v2 (canary)
# Header-based override sends test traffic exclusively to v2
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: payments-api
  namespace: payments
spec:
  host: payments-api.payments.svc.cluster.local
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        http1MaxPendingRequests: 50
        http2MaxRequests: 1000
        maxRequestsPerConnection: 10
    outlierDetection:        # Circuit breaking
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 30s
      maxEjectionPercent: 50
  subsets:
    - name: v1
      labels:
        version: v1
    - name: v2
      labels:
        version: v2
---
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: payments-api
  namespace: payments
spec:
  hosts:
    - payments-api.payments.svc.cluster.local
  http:
    # Header-based override for test traffic
    - match:
        - headers:
            x-test-cohort:
              exact: canary
      route:
        - destination:
            host: payments-api
            subset: v2
    # Default 90/10 split
    - route:
        - destination:
            host: payments-api
            subset: v1
          weight: 90
        - destination:
            host: payments-api
            subset: v2
          weight: 10
      retries:
        attempts: 3
        perTryTimeout: 2s
        retryOn: 5xx,reset,connect-failure
      timeout: 10s
      fault:
        delay:                  # Chaos engineering: 0.1% requests get +5s delay
          percentage:
            value: 0.1
          fixedDelay: 5s

mTLS & Identity

Every pod in the mesh receives a SPIFFE identity (spiffe://cluster.local/ns/{namespace}/sa/{serviceaccount}) and a short-lived X.509 certificate. Service-to-service communication is automatically encrypted and mutually authenticated.

# peer-authentication-strict.yaml
# Mesh-wide STRICT mTLS — refuses any plaintext traffic
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system    # Mesh-wide when in root namespace
spec:
  mtls:
    mode: STRICT
---
# Namespace-level PERMISSIVE allows mTLS-aware and plaintext during migration
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: legacy-permissive
  namespace: legacy-system
spec:
  mtls:
    mode: PERMISSIVE
Migration Hazard: Switching directly from no mesh to STRICT mTLS will break every cross-namespace call from non-meshed clients. Always stage as PERMISSIVE mesh-wide first, onboard namespaces one at a time, then flip to STRICT once Kiali shows zero plaintext traffic for 7+ days.

Authorization Policies

AuthorizationPolicy is Istio's identity-aware ACL. It evaluates after mTLS authentication and can match on source identity, destination service, HTTP method/path, header, and JWT claims.

# authz-payments-strict.yaml
# Default deny everything in the payments namespace
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: deny-all
  namespace: payments
spec:
  {}    # Empty spec = deny all
---
# Allow only the orders SA to call POST /charge
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: allow-orders-to-charge
  namespace: payments
spec:
  selector:
    matchLabels:
      app: payments-api
  action: ALLOW
  rules:
    - from:
        - source:
            principals:
              - "cluster.local/ns/orders/sa/orders-svc"
      to:
        - operation:
            methods: ["POST"]
            paths: ["/v1/charge"]
      when:
        - key: request.headers[x-tenant]
          values: ["acme", "globex"]
---
# Allow ingress gateway to call read-only endpoints with a valid JWT
apiVersion: security.istio.io/v1
kind: RequestAuthentication
metadata:
  name: jwt-auth
  namespace: payments
spec:
  selector:
    matchLabels:
      app: payments-api
  jwtRules:
    - issuer: "https://auth.corp.com"
      jwksUri: "https://auth.corp.com/.well-known/jwks.json"
      audiences: ["payments-api"]
---
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: allow-jwt-readonly
  namespace: payments
spec:
  selector:
    matchLabels:
      app: payments-api
  action: ALLOW
  rules:
    - from:
        - source:
            requestPrincipals: ["https://auth.corp.com/*"]
      to:
        - operation:
            methods: ["GET"]
      when:
        - key: request.auth.claims[scope]
          values: ["payments:read"]

Ingress & Gateway

Istio supports both its native Gateway CRD and the upstream Kubernetes Gateway API. New deployments should prefer the Gateway API.

# gateway-api-payments.yaml
# Kubernetes Gateway API (preferred for new deployments)
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: payments-gateway
  namespace: payments
spec:
  gatewayClassName: istio
  listeners:
    - name: https
      port: 443
      protocol: HTTPS
      tls:
        mode: Terminate
        certificateRefs:
          - name: payments-tls
      allowedRoutes:
        namespaces:
          from: Same
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: payments-api
  namespace: payments
spec:
  parentRefs:
    - name: payments-gateway
  hostnames:
    - api.payments.corp.com
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /v1
      backendRefs:
        - name: payments-api
          port: 8080
          weight: 100

Observability

Istio's "free observability" is its single most-cited adoption driver. Without writing any application code, you get:

  • Metrics: request count, duration percentiles, request/response sizes, TCP bytes — exported to Prometheus with rich labels (source, destination, response code, mTLS status).
  • Distributed traces: B3/W3C trace headers propagated automatically; spans exported to Jaeger, Zipkin, or any OpenTelemetry collector.
  • Access logs: structured JSON logs of every request (toggleable, since they are expensive at scale).
  • Topology graphs: Kiali renders the live service graph from telemetry, including request rate, error rate, and mTLS state per edge.
Case Study eBay's Mesh Migration
From 3,000 Sidecars to Ambient

eBay migrated their Istio mesh from sidecar to ambient mode in 2024 across ~3,000 services. The motivation was not L7 features — those were already in place — but operational. Sidecar restarts during Istio upgrades cascaded across the fleet (an hour-long affair) and the per-pod resource cost was ~$1.2M/year. Post-migration, ztunnel runs as a single DaemonSet (~340 nodes), upgrades complete in 12 minutes with zero application restarts, and the resource footprint dropped 65%. The lesson: ambient is most valuable not for what it adds, but for what it removes.

Ambient Mode Migration FinOps

Multi-Cluster

Istio supports three multi-cluster topologies: multi-primary (each cluster has its own istiod, peers federate identity), primary-remote (one istiod controls multiple clusters), and multi-network (clusters span network boundaries connected through east-west gateways). For most enterprises the multi-primary topology with shared trust domain is the right default.

# Multi-primary setup with a shared root CA
# Step 1: Create a shared root CA in both clusters
kubectl --context=cluster-1 create namespace istio-system
kubectl --context=cluster-1 create secret generic cacerts -n istio-system \
  --from-file=root-cert.pem \
  --from-file=ca-cert.pem \
  --from-file=ca-key.pem \
  --from-file=cert-chain.pem

kubectl --context=cluster-2 create namespace istio-system
kubectl --context=cluster-2 create secret generic cacerts -n istio-system \
  --from-file=root-cert.pem \
  --from-file=ca-cert.pem \
  --from-file=ca-key.pem \
  --from-file=cert-chain.pem

# Step 2: Install Istio on both clusters with shared mesh ID
istioctl --context=cluster-1 install -f - << EOF
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  values:
    global:
      meshID: mesh1
      multiCluster:
        clusterName: cluster-1
      network: network-1
EOF

# Step 3: Enable endpoint discovery from cluster-1 to cluster-2
istioctl create-remote-secret --context=cluster-2 --name=cluster-2 \
  | kubectl apply --context=cluster-1 -f -

Troubleshooting

# Comprehensive proxy diagnostics for one pod
istioctl proxy-status                           # All proxies' sync state
istioctl proxy-status payments-api-xyz.payments # One pod
istioctl proxy-config cluster payments-api-xyz.payments
istioctl proxy-config listener payments-api-xyz.payments
istioctl proxy-config route payments-api-xyz.payments

# Analyze the entire mesh for misconfiguration
istioctl analyze --all-namespaces

# Show what configuration a pod is actually receiving
istioctl proxy-config bootstrap payments-api-xyz.payments

# Tail Envoy access logs from a sidecar
kubectl logs -n payments payments-api-xyz -c istio-proxy -f

# Test a route with custom headers (Istio CLI built-in)
istioctl experimental wait --for=distribution VirtualService/payments-api.payments

# Check mTLS status between two services
istioctl experimental authz check payments-api-xyz.payments

# Why is a request being rejected? — explain
istioctl experimental describe pod payments-api-xyz -n payments

Common pitfalls:

  • 503 immediately after enabling mTLS: The destination doesn't have a sidecar — STRICT mode rejects plaintext. Either inject the sidecar or use PERMISSIVE during migration.
  • VirtualService not taking effect: Missing DestinationRule for subset routing, or VirtualService applied in wrong namespace.
  • Sidecar startup race: App reads from network before Envoy is ready. Use holdApplicationUntilProxyStarts: true in the proxy config.
  • AuthorizationPolicy "ALLOW" not allowing: Multiple ALLOW policies are OR'd, but a single DENY beats any ALLOW. Check for default-deny policies in the namespace.
  • High control plane CPU: Too many CRDs causing config push storms. Use Sidecar resources to scope what each proxy receives.