Cluster Mesh Concepts
Cilium Cluster Mesh extends Cilium's networking across multiple Kubernetes clusters, enabling transparent pod-to-pod connectivity as if all pods were running in a single flat network. It operates at the eBPF datapath level — no additional proxies or tunnels between clusters are needed beyond standard IP connectivity.
What Cluster Mesh Provides
- Pod-to-Pod Connectivity — Pods in cluster A can reach pods in cluster B by IP address without NAT or application-level gateways
- Shared Identity — Security identities from Cilium's identity-based policy model are replicated across clusters, so network policies work seamlessly
- Global Services — A Kubernetes Service can load-balance across backends in multiple clusters
- Service Affinity — Control whether traffic prefers local backends or distributes globally
Requirements
- Non-overlapping PodCIDRs — Each cluster must use a unique pod network range (e.g., cluster1: 10.1.0.0/16, cluster2: 10.2.0.0/16)
- IP Connectivity — Nodes across clusters must be able to reach each other (direct routing, VPN, or peered VPCs)
- Same Cilium CA — Clusters share a certificate authority so mutual TLS between agents is trusted
- Unique Cluster IDs — Each cluster must have a distinct
cluster-id(1–255)
Architecture
When Cluster Mesh is enabled, each cluster runs a clustermesh-apiserver component backed by an etcd instance. This etcd stores the cluster's Cilium identities, endpoints, and service information. Remote clusters connect to this etcd to synchronize state:
# Architecture overview:
# Cluster 1 Cluster 2
# ┌─────────────────────┐ ┌─────────────────────┐
# │ cilium-agent │ │ cilium-agent │
# │ clustermesh-apiserver│◄────────►│ clustermesh-apiserver│
# │ etcd (identities) │ │ etcd (identities) │
# └─────────────────────┘ └─────────────────────┘
#
# Each agent watches both local and remote etcd instances
# to build a unified view of identities and services.
Enable Cluster Mesh
The Cilium CLI provides straightforward commands to enable Cluster Mesh and connect clusters. Ensure both clusters already have Cilium installed with matching versions and non-overlapping CIDRs.
# Enable Cluster Mesh on cluster 1
cilium clustermesh enable --context cluster1
# Enable Cluster Mesh on cluster 2
cilium clustermesh enable --context cluster2
# Connect the two clusters
cilium clustermesh connect --context cluster1 --destination-context cluster2
# Verify connectivity
cilium clustermesh status --context cluster1
The cilium clustermesh enable command deploys the clustermesh-apiserver and exposes it (via LoadBalancer or NodePort). The connect command exchanges CA certificates and configures each cluster's agents to watch the remote cluster's etcd.
Verifying the Connection
# Check that all nodes have connected to the remote cluster
cilium clustermesh status --context cluster1 --wait
# Expected output shows:
# ✅ Cluster Mesh: OK
# ✅ Remote clusters: 1 connected
# - cluster2: ready, 3 nodes
If connectivity fails, verify that the clustermesh-apiserver Service has an external IP and that firewall rules allow port 2379 (etcd) between clusters.
Cross-Cluster Services
Once Cluster Mesh is established, you can create global services — Kubernetes Services whose endpoints span multiple clusters. A pod in cluster1 calling the Service DNS name will be load-balanced across backends in both clusters.
Global Service Annotation
apiVersion: v1
kind: Service
metadata:
name: api-server
annotations:
service.cilium.io/global: "true"
service.cilium.io/shared: "true"
spec:
selector:
app: api-server
ports:
- port: 80
targetPort: 8080
Deploy this Service definition in both clusters. The service.cilium.io/global: "true" annotation tells Cilium to merge endpoints from all connected clusters. The service.cilium.io/shared: "true" annotation means this cluster's backends are shared with remote clusters.
Service Affinity
By default, global services distribute traffic across all backends equally. Use the affinity annotation to prefer local backends:
apiVersion: v1
kind: Service
metadata:
name: api-server
annotations:
service.cilium.io/global: "true"
service.cilium.io/shared: "true"
service.cilium.io/affinity: "local"
spec:
selector:
app: api-server
ports:
- port: 80
targetPort: 8080
With service.cilium.io/affinity: "local", traffic stays within the local cluster when backends are healthy. If all local backends are down, Cilium automatically fails over to remote cluster backends — providing high availability without manual intervention.
affinity: "local", each region handles its own traffic for low latency, while the other region acts as a failover target.
BGP Control Plane
Cilium's BGP control plane enables nodes to peer with external BGP routers and advertise routes for Service LoadBalancer IPs, PodCIDRs, or both. This eliminates the need for MetalLB or cloud-specific load balancer integrations — Cilium handles IP advertisement natively via eBPF.
What BGP Advertises
- LoadBalancer Service IPs — External IPs assigned to Services of type LoadBalancer
- PodCIDRs — The pod network range allocated to each node, enabling direct pod routing
- ClusterIPs — Optionally advertise ClusterIP ranges for external access
Enable BGP
Enable the BGP control plane during Cilium installation or upgrade:
# Helm values for BGP
bgpControlPlane:
enabled: true
# Or via Cilium CLI
cilium install --set bgpControlPlane.enabled=true
# Or upgrade an existing installation
cilium upgrade --set bgpControlPlane.enabled=true
Enabling the BGP control plane starts a BGP speaker on each selected node. The speaker uses GoBGP under the hood and is configured entirely through Kubernetes CRDs — no configuration files on the nodes are needed.
CiliumBGPPeeringPolicy
The CiliumBGPPeeringPolicy CRD defines which nodes peer with which routers, what ASN to use, and what routes to advertise. Each policy targets a set of nodes via nodeSelector and configures one or more virtual routers.
apiVersion: cilium.io/v2alpha1
kind: CiliumBGPPeeringPolicy
metadata:
name: rack-bgp
spec:
nodeSelector:
matchLabels:
rack: rack-01
virtualRouters:
- localASN: 65001
exportPodCIDR: true
neighbors:
- peerAddress: "10.0.0.1/32"
peerASN: 65000
connectRetryTimeSeconds: 120
holdTimeSeconds: 90
keepAliveTimeSeconds: 30
serviceSelector:
matchExpressions:
- key: somekey
operator: NotIn
values: ["never-advertise"]
Field Reference
| Field | Description |
|---|---|
nodeSelector |
Selects which nodes this peering policy applies to (e.g., nodes in a specific rack or availability zone) |
localASN |
The BGP Autonomous System Number for Cilium's speaker on the selected nodes |
exportPodCIDR |
When true, advertises the node's PodCIDR to BGP peers — enables direct pod routing from external networks |
neighbors |
List of BGP peers (routers) to establish sessions with — includes address, ASN, and timer configuration |
peerAddress |
IP address of the BGP peer router in CIDR notation (/32 for a single host) |
peerASN |
The ASN of the remote BGP router |
holdTimeSeconds |
Maximum time (in seconds) without a keepalive before the session is declared down |
serviceSelector |
Label selector to control which LoadBalancer Services have their IPs advertised (empty = all Services) |
Multiple Peers Example
In production, nodes typically peer with two top-of-rack (ToR) switches for redundancy:
apiVersion: cilium.io/v2alpha1
kind: CiliumBGPPeeringPolicy
metadata:
name: dual-tor-bgp
spec:
nodeSelector:
matchLabels:
kubernetes.io/os: linux
virtualRouters:
- localASN: 65010
exportPodCIDR: true
neighbors:
- peerAddress: "10.0.0.1/32"
peerASN: 65000
holdTimeSeconds: 90
keepAliveTimeSeconds: 30
- peerAddress: "10.0.0.2/32"
peerASN: 65000
holdTimeSeconds: 90
keepAliveTimeSeconds: 30
serviceSelector:
matchLabels:
advertise: "external"
With this configuration, only Services labeled advertise: external have their LoadBalancer IPs announced to the ToR switches.
L2 Announcements
For environments without BGP routers — such as bare-metal home labs, development clusters, or flat L2 networks — Cilium provides L2 Announcements. This feature uses ARP (IPv4) or NDP (IPv6) to announce LoadBalancer IPs on the local network segment, similar to what MetalLB does in L2 mode.
CiliumL2AnnouncementPolicy
apiVersion: cilium.io/v2alpha1
kind: CiliumL2AnnouncementPolicy
metadata:
name: l2-policy
spec:
interfaces:
- ^eth[0-9]+
externalIPs: true
loadBalancerIPs: true
nodeSelector:
matchLabels:
node-role.kubernetes.io/worker: ""
Field Reference
- interfaces — Regex patterns matching which network interfaces should respond to ARP requests for virtual IPs
- externalIPs — Whether to announce Service externalIPs via L2
- loadBalancerIPs — Whether to announce LoadBalancer IPs via L2
- nodeSelector — Limits which nodes participate in L2 announcements (leader election happens among these nodes)
CiliumLoadBalancerIPPool
L2 announcements require an IP pool from which Cilium allocates LoadBalancer IPs. This replaces MetalLB's address pool configuration:
apiVersion: cilium.io/v2alpha1
kind: CiliumLoadBalancerIPPool
metadata:
name: main-pool
spec:
blocks:
- cidr: "192.168.1.200/28"
This allocates IPs from 192.168.1.200 to 192.168.1.215 (16 addresses). When you create a Service of type LoadBalancer, Cilium assigns an IP from this pool and one node begins responding to ARP requests for that IP.
Complete L2 Setup Example
# 1. Install Cilium with L2 announcements and LB IPAM enabled
cilium install \
--set l2announcements.enabled=true \
--set externalIPs.enabled=true \
--set kubeProxyReplacement=true
# 2. Create the IP pool
apiVersion: cilium.io/v2alpha1
kind: CiliumLoadBalancerIPPool
metadata:
name: home-lab-pool
spec:
blocks:
- cidr: "192.168.1.240/28"
---
# 3. Create the L2 announcement policy
apiVersion: cilium.io/v2alpha1
kind: CiliumL2AnnouncementPolicy
metadata:
name: default-l2
spec:
interfaces:
- ^eth[0-9]+
- ^eno[0-9]+
- ^enp[0-9]+s[0-9]+
externalIPs: true
loadBalancerIPs: true
# 4. Create a LoadBalancer Service — Cilium assigns an IP from the pool
apiVersion: v1
kind: Service
metadata:
name: nginx-lb
spec:
type: LoadBalancer
selector:
app: nginx
ports:
- port: 80
targetPort: 80
# 5. Verify the assigned IP
kubectl get svc nginx-lb
# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
# nginx-lb LoadBalancer 10.96.123.45 192.168.1.240 80:31234/TCP 10s
# 6. Test from another machine on the same L2 network
curl http://192.168.1.240
Exercises
cilium connectivity test --multi-cluster to validate.
vtysh -c "show ip bgp" on the FRR instance that routes appear when you create a LoadBalancer Service.
arping to confirm ARP responses.
Key Takeaways
- Cluster Mesh enables pod-to-pod and service-to-service communication across multiple Kubernetes clusters with shared identities and policies
- Global services use annotations (
service.cilium.io/global) to merge endpoints across cluster boundaries with optional local affinity - BGP control plane advertises LoadBalancer IPs and PodCIDRs to external routers — eliminating the need for MetalLB in routed environments
- CiliumBGPPeeringPolicy configures BGP sessions declaratively via CRDs — no node-level configuration files needed
- L2 announcements provide LoadBalancer IP assignment for bare-metal and flat networks without requiring BGP infrastructure
- CiliumLoadBalancerIPPool replaces MetalLB address pools, integrating IP allocation directly into Cilium's data plane