The Kubernetes Network Model
Kubernetes imposes a flat network model — every Pod gets its own unique IP address, and all Pods can communicate directly without NAT. This simplifies application networking enormously: containers behave as if they're on the same LAN, regardless of which node they're scheduled on.
- Every Pod gets its own IP address — no need to map ports between Pods.
- Pods can communicate with all other Pods without NAT — the IP a Pod sees for itself is the same IP others use to reach it.
- Agents (kubelet, kube-proxy) on a node can communicate with all Pods on that node — host-to-Pod networking works both directions.
Kubernetes does not implement networking itself — it delegates to CNI (Container Network Interface) plugins. The CNI plugin is responsible for assigning Pod IPs, configuring network interfaces, and establishing cross-node connectivity (via overlay networks, BGP, or cloud provider routing).
Pod-to-Pod Communication
Same Node
When two Pods run on the same node, they communicate via a Linux bridge (typically cbr0 or cni0). Each Pod's network namespace connects to the bridge via a veth pair — one end inside the Pod (appears as eth0), the other end attached to the bridge on the host.
flowchart LR
subgraph Node1["Node 1"]
PA["Pod A\neth0\n10.244.1.5"] -->|veth pair| BR1["Bridge\ncni0\n10.244.1.1"]
PB["Pod B\neth0\n10.244.1.8"] -->|veth pair| BR1
end
subgraph Node2["Node 2"]
PC["Pod C\neth0\n10.244.2.3"] -->|veth pair| BR2["Bridge\ncni0\n10.244.2.1"]
end
BR1 -->|"CNI Overlay\n(VXLAN/BGP/\nCloud Routes)"| BR2
Cross-Node
Cross-node communication depends on the CNI plugin's strategy. Three common approaches:
- Overlay (VXLAN/Geneve): Encapsulates Pod traffic in outer UDP packets between nodes. Works anywhere, but adds ~50 bytes of overhead per packet (Flannel VXLAN, Calico VXLAN mode).
- BGP Routing: Advertises Pod CIDR routes between nodes via BGP. No encapsulation overhead, but requires L3 network support (Calico BGP mode).
- Cloud Provider Routes: Uses the cloud's VPC routing table to route Pod CIDRs to the correct node. Zero overlay overhead, native performance (AWS VPC CNI, GKE native routing).
# View Pod IPs and which node they're on
kubectl get pods -o wide
# NAME READY STATUS IP NODE
# nginx-abc 1/1 Running 10.244.1.5 node-1
# redis-xyz 1/1 Running 10.244.2.3 node-2
# View the CNI bridge on a node
ssh node-1 "ip link show type bridge"
# cni0: <BROADCAST,MULTICAST,UP> mtu 1450 state UP
# View veth pairs connecting Pods to the bridge
ssh node-1 "bridge link show"
# veth1234@if2: <BROADCAST,MULTICAST,UP> master cni0
# View Pod CIDR allocation per node
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.podCIDR}{"\n"}{end}'
# node-1 10.244.1.0/24
# node-2 10.244.2.0/24
Services
Pods are ephemeral — they come and go, and their IPs change. A Service provides a stable virtual IP (ClusterIP) and DNS name that load-balances across a set of Pods selected by label. Services are the primary abstraction for service discovery in Kubernetes.
flowchart LR
CI["ClusterIP\n(internal only)"] --> NP["NodePort\n(+ node IP:port)"]
NP --> LB["LoadBalancer\n(+ external IP)"]
style CI fill:#3B9797,color:#fff
style NP fill:#16476A,color:#fff
style LB fill:#BF092F,color:#fff
ClusterIP
The default Service type. Assigns a virtual IP reachable only within the cluster. kube-proxy programs iptables/IPVS rules to DNAT traffic to one of the backend Pod IPs.
# ClusterIP Service manifest
apiVersion: v1
kind: Service
metadata:
name: my-api
namespace: default
spec:
type: ClusterIP
selector:
app: my-api # Matches Pods with label app=my-api
ports:
- port: 80 # Service port (what clients connect to)
targetPort: 8080 # Pod port (where the app listens)
protocol: TCP
NodePort
Exposes the Service on a static port (30000–32767) on every node's IP. External traffic hits <NodeIP>:<NodePort> and gets routed to a backend Pod. Useful for development; in production, use LoadBalancer or Ingress.
LoadBalancer
On cloud providers, creates an external load balancer (AWS ELB/NLB, GCP LB, Azure LB) that routes traffic to NodePorts. The LB gets a public IP/DNS. This is the simplest way to expose a service to the internet — but each Service gets its own LB (expensive at scale; prefer Ingress for HTTP).
ExternalName
Maps a Service to an external DNS name (CNAME record). No proxying — just DNS resolution. Useful for abstracting external dependencies (e.g., a managed database) behind a Kubernetes-native DNS name.
| Service Type | Scope | Port Range | Use Case |
|---|---|---|---|
| ClusterIP | Internal only | Any | Inter-service communication within cluster |
| NodePort | Internal + Node IP | 30000–32767 | Development, on-prem without LB |
| LoadBalancer | External (public IP) | Any | Exposing single service to internet |
| ExternalName | DNS alias | N/A | Abstracting external dependencies |
| Headless (clusterIP: None) | Internal (no VIP) | Any | StatefulSets, direct Pod discovery via DNS |
# List Services and their ClusterIPs
kubectl get svc
# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
# kubernetes ClusterIP 10.96.0.1 <none> 443/TCP
# my-api ClusterIP 10.96.45.123 <none> 80/TCP
# Describe a Service — shows endpoints (backend Pod IPs)
kubectl describe svc my-api
# Endpoints: 10.244.1.5:8080, 10.244.2.3:8080
# View Endpoints resource directly
kubectl get endpoints my-api
# NAME ENDPOINTS AGE
# my-api 10.244.1.5:8080,10.244.2.3:8080 5m
kube-proxy & IPVS
kube-proxy runs on every node and implements the Service abstraction by programming packet-forwarding rules. It watches the API server for Service/Endpoint changes and updates rules in real time.
Two modes:
- iptables mode (default): Creates DNAT rules that randomly select a backend Pod. O(n) rule evaluation for n endpoints — struggles above ~5,000 Services.
- IPVS mode: Uses Linux IPVS (IP Virtual Server) kernel module for L4 load balancing. O(1) lookup via hash tables, supports multiple algorithms (round-robin, least connections, source hashing). Recommended for large clusters.
# View iptables rules created by kube-proxy for a Service
sudo iptables -t nat -L KUBE-SERVICES -n | grep my-api
# -d 10.96.45.123/32 -p tcp --dport 80 -j KUBE-SVC-XXXXXX
# Follow the chain to see backend selection
sudo iptables -t nat -L KUBE-SVC-XXXXXX -n
# -m statistic --mode random --probability 0.5 -j KUBE-SEP-AAAAAA
# -j KUBE-SEP-BBBBBB
# Each KUBE-SEP chain DNATs to a Pod IP
sudo iptables -t nat -L KUBE-SEP-AAAAAA -n
# -p tcp -j DNAT --to-destination 10.244.1.5:8080
# Check kube-proxy mode
kubectl -n kube-system get cm kube-proxy -o yaml | grep mode
# mode: "iptables" (or "ipvs")
# IPVS mode: view virtual servers
sudo ipvsadm -Ln | head -20
# TCP 10.96.45.123:80 rr
# -> 10.244.1.5:8080 Masq 1
# -> 10.244.2.3:8080 Masq 1
Ingress & Ingress Controllers
An Ingress is a Kubernetes resource that defines HTTP/HTTPS routing rules — mapping hostnames and paths to backend Services. Unlike LoadBalancer Services (one LB per service), a single Ingress can route to many Services, making it cost-effective for HTTP workloads.
Ingress resources are declarative — they do nothing without an Ingress Controller (nginx-ingress, Traefik, HAProxy, AWS ALB Ingress Controller, etc.) watching and implementing the rules.
# Ingress manifest — route by host and path
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: app-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
ingressClassName: nginx
rules:
- host: api.example.com
http:
paths:
- path: /users
pathType: Prefix
backend:
service:
name: users-svc
port:
number: 80
- path: /orders
pathType: Prefix
backend:
service:
name: orders-svc
port:
number: 80
tls:
- hosts:
- api.example.com
secretName: api-tls-cert
# List Ingress resources
kubectl get ingress
# NAME CLASS HOSTS ADDRESS PORTS AGE
# app-ingress nginx api.example.com 34.56.78.90 80, 443 10m
# View Ingress controller pods
kubectl -n ingress-nginx get pods
# ingress-nginx-controller-xxxx 1/1 Running
# Check the Ingress controller's configuration
kubectl -n ingress-nginx exec deploy/ingress-nginx-controller -- cat /etc/nginx/nginx.conf | grep -A 5 "api.example.com"
NetworkPolicies
A NetworkPolicy is a firewall rule for Pods. It selects Pods via labels and defines allowed ingress/egress traffic. NetworkPolicies are enforced by the CNI plugin (Calico, Cilium, Weave — but NOT Flannel alone).
# NetworkPolicy: deny all ingress to Pods in namespace "production"
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-all-ingress
namespace: production
spec:
podSelector: {} # Selects ALL Pods in this namespace
policyTypes:
- Ingress # Only affects ingress (egress unchanged)
ingress: [] # Empty = no ingress allowed
# NetworkPolicy: allow traffic from frontend to backend on port 8080
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend-to-backend
namespace: production
spec:
podSelector:
matchLabels:
app: backend # Apply to Pods with app=backend
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend # Allow from Pods with app=frontend
ports:
- protocol: TCP
port: 8080
# List NetworkPolicies in a namespace
kubectl get networkpolicies -n production
# NAME POD-SELECTOR AGE
# deny-all-ingress <none> 5m
# allow-frontend-to-backend app=backend 3m
# Test connectivity — exec into a frontend pod and curl backend
kubectl exec -n production deploy/frontend -- curl -s --max-time 3 http://backend:8080/health
# {"status":"ok"}
# Test that other pods are blocked
kubectl exec -n production deploy/monitoring -- curl -s --max-time 3 http://backend:8080/health
# curl: (28) Connection timed out (BLOCKED by NetworkPolicy)
CoreDNS & Service Discovery
CoreDNS is the cluster DNS server (runs as a Deployment in kube-system). Every Pod's /etc/resolv.conf points to the CoreDNS ClusterIP. Service discovery works via DNS — any Service is reachable by its name.
DNS record format: <service>.<namespace>.svc.cluster.local
# View CoreDNS pods and service
kubectl -n kube-system get pods -l k8s-app=kube-dns
kubectl -n kube-system get svc kube-dns
# NAME TYPE CLUSTER-IP PORT(S)
# kube-dns ClusterIP 10.96.0.10 53/UDP,53/TCP
# Resolve a Service name from inside a Pod
kubectl exec -it deploy/debug -- nslookup my-api
# Name: my-api.default.svc.cluster.local
# Address: 10.96.45.123
# Resolve a Service in another namespace
kubectl exec -it deploy/debug -- nslookup redis.cache.svc.cluster.local
# Address: 10.96.78.90
# View a Pod's DNS configuration
kubectl exec deploy/debug -- cat /etc/resolv.conf
# nameserver 10.96.0.10
# search default.svc.cluster.local svc.cluster.local cluster.local
# ndots:5
# Headless Service — returns individual Pod IPs (no ClusterIP)
kubectl exec deploy/debug -- nslookup my-statefulset-headless
# Name: my-statefulset-headless.default.svc.cluster.local
# Address: 10.244.1.5
# Address: 10.244.2.3
CNI Plugin Comparison
| CNI Plugin | Network Mode | NetworkPolicy | Key Features |
|---|---|---|---|
| Calico | BGP, VXLAN, IPIP | Yes (full) | Most popular, eBPF dataplane option, WireGuard encryption |
| Cilium | eBPF (no iptables) | Yes (L3-L7) | eBPF-native, L7 policies, Hubble observability, service mesh |
| Flannel | VXLAN, host-gw | No | Simplest setup, no NetworkPolicy (pair with Calico for policies) |
| Weave Net | VXLAN, sleeve | Yes (basic) | Encryption built-in, mesh topology, multicast support |
| AWS VPC CNI | Native VPC routing | Yes (via Calico addon) | Pods get real VPC IPs, no overlay overhead, security groups per Pod |
Service Mesh — When Kubernetes Networking Isn't Enough
Kubernetes Services provide L4 load balancing and basic discovery. A service mesh (Istio, Linkerd, Cilium Service Mesh) adds L7 capabilities via sidecar proxies (Envoy) injected into every Pod:
- mTLS — automatic mutual TLS between all services (zero-trust networking without app changes)
- Traffic management — canary releases, circuit breaking, retries, timeouts at the mesh level
- Observability — distributed tracing, golden metrics (latency, traffic, errors, saturation) per service
- Policy — L7 authorization (allow GET /api/users but deny DELETE /api/users)
The trade-off: added latency (~1-2ms per hop), memory overhead (~50MB per sidecar), and operational complexity. Use a service mesh when you have 10+ microservices and need consistent security/observability without modifying application code.
Exercises
# Exercise 1: Inspect your cluster's Service networking
kubectl get svc --all-namespaces -o wide
kubectl get endpoints --all-namespaces | head -20
# Exercise 2: Trace DNS resolution inside a Pod
kubectl run debug --image=busybox:1.36 --rm -it --restart=Never -- nslookup kubernetes.default
# Exercise 3: View kube-proxy mode
kubectl -n kube-system get cm kube-proxy -o jsonpath='{.data.config\.conf}' | grep mode
# Exercise 4: Check Pod CIDR allocations
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{" → "}{.spec.podCIDR}{"\n"}{end}'
# Exercise 5: List NetworkPolicies and their selectors
kubectl get networkpolicies --all-namespaces -o wide
Conclusion & Next Steps
Kubernetes networking is built on a simple contract — every Pod gets a routable IP, and CNI plugins handle the implementation. Services provide stable endpoints and load balancing via kube-proxy (iptables/IPVS). Ingress consolidates HTTP routing behind a single entry point. NetworkPolicies enforce microsegmentation (deny-by-default once applied). CoreDNS makes everything discoverable by name. Understanding these layers — and which component is responsible for each — is the key to debugging connectivity issues in production clusters.