Overview & History
Istio is a service mesh — a dedicated infrastructure layer that handles service-to-service communication transparently. It moves cross-cutting networking concerns (TLS, retries, circuit breaking, traffic splitting, authentication, authorisation, telemetry) out of every microservice and into a shared, configurable mesh.
Created by Google, IBM, and Lyft in 2017, Istio combines a control plane (originally istiod) with Envoy proxies as the data plane. It joined CNCF as an incubating project in 2022 and graduated in 2024 — one of the fastest paths to graduation in CNCF history. Today it powers production meshes at eBay, IBM, Salesforce, T-Mobile, and Atlassian, among many others.
Architecture
Istio has two layers: a data plane of proxies that sits in the request path, and a control plane that configures those proxies dynamically.
flowchart TD
Operator["Operator / Platform Team"] --> CRDs["Istio CRDs
VirtualService · DestinationRule
AuthorizationPolicy · PeerAuthentication"]
CRDs --> Istiod["istiod
(Control Plane)"]
Istiod -.xDS over gRPC.-> Proxy1["Envoy Sidecar
(Service A pod)"]
Istiod -.xDS over gRPC.-> Proxy2["Envoy Sidecar
(Service B pod)"]
Istiod -.xDS over gRPC.-> Gateway["istio-ingressgateway
Edge Envoy"]
Client["External Client"] --> Gateway
Gateway --> Proxy1
Proxy1 -.mTLS.-> Proxy2
Proxy1 --> AppA["App A
(localhost)"]
Proxy2 --> AppB["App B
(localhost)"]
style Operator fill:#e8f4f4,stroke:#3B9797,color:#132440
style CRDs fill:#e8f4f4,stroke:#3B9797,color:#132440
style Istiod fill:#f0f4f8,stroke:#16476A,color:#132440
style Proxy1 fill:#f0f4f8,stroke:#16476A,color:#132440
style Proxy2 fill:#f0f4f8,stroke:#16476A,color:#132440
style Gateway fill:#f0f4f8,stroke:#16476A,color:#132440
style Client fill:#fff5f5,stroke:#BF092F,color:#132440
style AppA fill:#132440,stroke:#132440,color:#ffffff
style AppB fill:#132440,stroke:#132440,color:#ffffff
Sidecar vs Ambient Mode
For its first six years, Istio used the sidecar pattern exclusively — every pod gets an Envoy injected as an additional container that intercepts all traffic via iptables. Effective, but resource-heavy: at one CPU and one memory request per sidecar, large meshes pay a 20-30% infrastructure tax just to run the proxies.
Istio 1.18 (2023) introduced ambient mode — a sidecar-less architecture that splits the mesh into two layers: a per-node ztunnel (zero-trust tunnel) handling L4 + mTLS, and an optional per-namespace waypoint proxy handling L7 features (HTTP routing, AuthorizationPolicy on path/method). Ambient reached GA in Istio 1.22 (2024) and is now the recommended mode for new deployments.
| Aspect | Sidecar Mode | Ambient Mode |
|---|---|---|
| Proxy granularity | One Envoy per pod | One ztunnel per node + optional waypoint per namespace |
| Resource overhead | ~100m CPU, ~128Mi RAM per pod | ~100m CPU, ~128Mi RAM per node (shared) |
| Pod restarts on mesh upgrade | Yes (sidecar version change) | No (only ztunnel restarts) |
| L7 features | Always available | Opt-in via waypoint per namespace |
| Maturity | Production for 8+ years | GA since 1.22 (2024) |
| Best fit | Existing meshes; per-pod policy isolation | New meshes; large fleets; FinOps-conscious |
Installation
# Download istioctl
curl -L https://istio.io/downloadIstio | ISTIO_VERSION=1.22.3 sh -
cd istio-1.22.3
export PATH=$PWD/bin:$PATH
# Install with the default profile (sidecar mode)
istioctl install --set profile=default -y
# Or install in ambient mode
istioctl install --set profile=ambient --skip-confirmation
# Enable sidecar injection on a namespace (sidecar mode)
kubectl label namespace payments istio-injection=enabled
# Enable ambient mode on a namespace (ambient mode)
kubectl label namespace payments istio.io/dataplane-mode=ambient
# Verify the installation
istioctl verify-install
kubectl get pods -n istio-system
# Install observability addons (Kiali, Prometheus, Grafana, Jaeger)
kubectl apply -f samples/addons
Traffic Management
Traffic management is configured by combining three CRDs: Gateway (which ports the mesh listens on), VirtualService (routing rules — host, path, header, weight), and DestinationRule (per-backend policies — load balancing, circuit breaking, subsets).
# traffic-canary-payments.yaml
# Splits 90% of payments-api traffic to v1, 10% to v2 (canary)
# Header-based override sends test traffic exclusively to v2
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
name: payments-api
namespace: payments
spec:
host: payments-api.payments.svc.cluster.local
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
http:
http1MaxPendingRequests: 50
http2MaxRequests: 1000
maxRequestsPerConnection: 10
outlierDetection: # Circuit breaking
consecutive5xxErrors: 5
interval: 30s
baseEjectionTime: 30s
maxEjectionPercent: 50
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
---
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
name: payments-api
namespace: payments
spec:
hosts:
- payments-api.payments.svc.cluster.local
http:
# Header-based override for test traffic
- match:
- headers:
x-test-cohort:
exact: canary
route:
- destination:
host: payments-api
subset: v2
# Default 90/10 split
- route:
- destination:
host: payments-api
subset: v1
weight: 90
- destination:
host: payments-api
subset: v2
weight: 10
retries:
attempts: 3
perTryTimeout: 2s
retryOn: 5xx,reset,connect-failure
timeout: 10s
fault:
delay: # Chaos engineering: 0.1% requests get +5s delay
percentage:
value: 0.1
fixedDelay: 5s
mTLS & Identity
Every pod in the mesh receives a SPIFFE identity (spiffe://cluster.local/ns/{namespace}/sa/{serviceaccount}) and a short-lived X.509 certificate. Service-to-service communication is automatically encrypted and mutually authenticated.
# peer-authentication-strict.yaml
# Mesh-wide STRICT mTLS — refuses any plaintext traffic
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: default
namespace: istio-system # Mesh-wide when in root namespace
spec:
mtls:
mode: STRICT
---
# Namespace-level PERMISSIVE allows mTLS-aware and plaintext during migration
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: legacy-permissive
namespace: legacy-system
spec:
mtls:
mode: PERMISSIVE
STRICT mTLS will break every cross-namespace call from non-meshed clients. Always stage as PERMISSIVE mesh-wide first, onboard namespaces one at a time, then flip to STRICT once Kiali shows zero plaintext traffic for 7+ days.
Authorization Policies
AuthorizationPolicy is Istio's identity-aware ACL. It evaluates after mTLS authentication and can match on source identity, destination service, HTTP method/path, header, and JWT claims.
# authz-payments-strict.yaml
# Default deny everything in the payments namespace
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: deny-all
namespace: payments
spec:
{} # Empty spec = deny all
---
# Allow only the orders SA to call POST /charge
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: allow-orders-to-charge
namespace: payments
spec:
selector:
matchLabels:
app: payments-api
action: ALLOW
rules:
- from:
- source:
principals:
- "cluster.local/ns/orders/sa/orders-svc"
to:
- operation:
methods: ["POST"]
paths: ["/v1/charge"]
when:
- key: request.headers[x-tenant]
values: ["acme", "globex"]
---
# Allow ingress gateway to call read-only endpoints with a valid JWT
apiVersion: security.istio.io/v1
kind: RequestAuthentication
metadata:
name: jwt-auth
namespace: payments
spec:
selector:
matchLabels:
app: payments-api
jwtRules:
- issuer: "https://auth.corp.com"
jwksUri: "https://auth.corp.com/.well-known/jwks.json"
audiences: ["payments-api"]
---
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: allow-jwt-readonly
namespace: payments
spec:
selector:
matchLabels:
app: payments-api
action: ALLOW
rules:
- from:
- source:
requestPrincipals: ["https://auth.corp.com/*"]
to:
- operation:
methods: ["GET"]
when:
- key: request.auth.claims[scope]
values: ["payments:read"]
Ingress & Gateway
Istio supports both its native Gateway CRD and the upstream Kubernetes Gateway API. New deployments should prefer the Gateway API.
# gateway-api-payments.yaml
# Kubernetes Gateway API (preferred for new deployments)
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: payments-gateway
namespace: payments
spec:
gatewayClassName: istio
listeners:
- name: https
port: 443
protocol: HTTPS
tls:
mode: Terminate
certificateRefs:
- name: payments-tls
allowedRoutes:
namespaces:
from: Same
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: payments-api
namespace: payments
spec:
parentRefs:
- name: payments-gateway
hostnames:
- api.payments.corp.com
rules:
- matches:
- path:
type: PathPrefix
value: /v1
backendRefs:
- name: payments-api
port: 8080
weight: 100
Observability
Istio's "free observability" is its single most-cited adoption driver. Without writing any application code, you get:
- Metrics: request count, duration percentiles, request/response sizes, TCP bytes — exported to Prometheus with rich labels (source, destination, response code, mTLS status).
- Distributed traces: B3/W3C trace headers propagated automatically; spans exported to Jaeger, Zipkin, or any OpenTelemetry collector.
- Access logs: structured JSON logs of every request (toggleable, since they are expensive at scale).
- Topology graphs: Kiali renders the live service graph from telemetry, including request rate, error rate, and mTLS state per edge.
From 3,000 Sidecars to Ambient
eBay migrated their Istio mesh from sidecar to ambient mode in 2024 across ~3,000 services. The motivation was not L7 features — those were already in place — but operational. Sidecar restarts during Istio upgrades cascaded across the fleet (an hour-long affair) and the per-pod resource cost was ~$1.2M/year. Post-migration, ztunnel runs as a single DaemonSet (~340 nodes), upgrades complete in 12 minutes with zero application restarts, and the resource footprint dropped 65%. The lesson: ambient is most valuable not for what it adds, but for what it removes.
Multi-Cluster
Istio supports three multi-cluster topologies: multi-primary (each cluster has its own istiod, peers federate identity), primary-remote (one istiod controls multiple clusters), and multi-network (clusters span network boundaries connected through east-west gateways). For most enterprises the multi-primary topology with shared trust domain is the right default.
# Multi-primary setup with a shared root CA
# Step 1: Create a shared root CA in both clusters
kubectl --context=cluster-1 create namespace istio-system
kubectl --context=cluster-1 create secret generic cacerts -n istio-system \
--from-file=root-cert.pem \
--from-file=ca-cert.pem \
--from-file=ca-key.pem \
--from-file=cert-chain.pem
kubectl --context=cluster-2 create namespace istio-system
kubectl --context=cluster-2 create secret generic cacerts -n istio-system \
--from-file=root-cert.pem \
--from-file=ca-cert.pem \
--from-file=ca-key.pem \
--from-file=cert-chain.pem
# Step 2: Install Istio on both clusters with shared mesh ID
istioctl --context=cluster-1 install -f - << EOF
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
values:
global:
meshID: mesh1
multiCluster:
clusterName: cluster-1
network: network-1
EOF
# Step 3: Enable endpoint discovery from cluster-1 to cluster-2
istioctl create-remote-secret --context=cluster-2 --name=cluster-2 \
| kubectl apply --context=cluster-1 -f -
Troubleshooting
# Comprehensive proxy diagnostics for one pod
istioctl proxy-status # All proxies' sync state
istioctl proxy-status payments-api-xyz.payments # One pod
istioctl proxy-config cluster payments-api-xyz.payments
istioctl proxy-config listener payments-api-xyz.payments
istioctl proxy-config route payments-api-xyz.payments
# Analyze the entire mesh for misconfiguration
istioctl analyze --all-namespaces
# Show what configuration a pod is actually receiving
istioctl proxy-config bootstrap payments-api-xyz.payments
# Tail Envoy access logs from a sidecar
kubectl logs -n payments payments-api-xyz -c istio-proxy -f
# Test a route with custom headers (Istio CLI built-in)
istioctl experimental wait --for=distribution VirtualService/payments-api.payments
# Check mTLS status between two services
istioctl experimental authz check payments-api-xyz.payments
# Why is a request being rejected? — explain
istioctl experimental describe pod payments-api-xyz -n payments
Common pitfalls:
- 503 immediately after enabling mTLS: The destination doesn't have a sidecar — STRICT mode rejects plaintext. Either inject the sidecar or use PERMISSIVE during migration.
- VirtualService not taking effect: Missing
DestinationRulefor subset routing, or VirtualService applied in wrong namespace. - Sidecar startup race: App reads from network before Envoy is ready. Use
holdApplicationUntilProxyStarts: truein the proxy config. - AuthorizationPolicy "ALLOW" not allowing: Multiple ALLOW policies are OR'd, but a single DENY beats any ALLOW. Check for default-deny policies in the namespace.
- High control plane CPU: Too many CRDs causing config push storms. Use
Sidecarresources to scope what each proxy receives.