Table of Contents

  1. Cluster Mesh Concepts
  2. Enable Cluster Mesh
  3. Cross-Cluster Services
  4. BGP Control Plane
  5. CiliumBGPPeeringPolicy
  6. L2 Announcements
  7. Exercises
  8. Key Takeaways
Back to Distributed Systems & Kubernetes

Cilium Track Part 3: Cluster Mesh & BGP

June 6, 2026 Wasil Zafar 36 min read

Connect multiple Kubernetes clusters with Cilium Cluster Mesh for transparent pod-to-pod communication, shared services, and global load balancing. Use BGP to advertise LoadBalancer IPs and PodCIDRs to your network infrastructure.

Cluster Mesh Concepts

Cilium Cluster Mesh extends Cilium's networking across multiple Kubernetes clusters, enabling transparent pod-to-pod connectivity as if all pods were running in a single flat network. It operates at the eBPF datapath level — no additional proxies or tunnels between clusters are needed beyond standard IP connectivity.

What Cluster Mesh Provides

  • Pod-to-Pod Connectivity — Pods in cluster A can reach pods in cluster B by IP address without NAT or application-level gateways
  • Shared Identity — Security identities from Cilium's identity-based policy model are replicated across clusters, so network policies work seamlessly
  • Global Services — A Kubernetes Service can load-balance across backends in multiple clusters
  • Service Affinity — Control whether traffic prefers local backends or distributes globally

Requirements

Critical requirement: PodCIDRs and ClusterIPs must not overlap between clusters. Plan your IP address space carefully before deploying multi-cluster environments.
  • Non-overlapping PodCIDRs — Each cluster must use a unique pod network range (e.g., cluster1: 10.1.0.0/16, cluster2: 10.2.0.0/16)
  • IP Connectivity — Nodes across clusters must be able to reach each other (direct routing, VPN, or peered VPCs)
  • Same Cilium CA — Clusters share a certificate authority so mutual TLS between agents is trusted
  • Unique Cluster IDs — Each cluster must have a distinct cluster-id (1–255)

Architecture

When Cluster Mesh is enabled, each cluster runs a clustermesh-apiserver component backed by an etcd instance. This etcd stores the cluster's Cilium identities, endpoints, and service information. Remote clusters connect to this etcd to synchronize state:

# Architecture overview:
# Cluster 1                        Cluster 2
# ┌─────────────────────┐          ┌─────────────────────┐
# │ cilium-agent        │          │ cilium-agent        │
# │ clustermesh-apiserver│◄────────►│ clustermesh-apiserver│
# │ etcd (identities)   │          │ etcd (identities)   │
# └─────────────────────┘          └─────────────────────┘
#
# Each agent watches both local and remote etcd instances
# to build a unified view of identities and services.

Enable Cluster Mesh

The Cilium CLI provides straightforward commands to enable Cluster Mesh and connect clusters. Ensure both clusters already have Cilium installed with matching versions and non-overlapping CIDRs.

# Enable Cluster Mesh on cluster 1
cilium clustermesh enable --context cluster1

# Enable Cluster Mesh on cluster 2
cilium clustermesh enable --context cluster2

# Connect the two clusters
cilium clustermesh connect --context cluster1 --destination-context cluster2

# Verify connectivity
cilium clustermesh status --context cluster1

The cilium clustermesh enable command deploys the clustermesh-apiserver and exposes it (via LoadBalancer or NodePort). The connect command exchanges CA certificates and configures each cluster's agents to watch the remote cluster's etcd.

Verifying the Connection

# Check that all nodes have connected to the remote cluster
cilium clustermesh status --context cluster1 --wait

# Expected output shows:
# ✅ Cluster Mesh:       OK
# ✅ Remote clusters:    1 connected
#    - cluster2: ready, 3 nodes

If connectivity fails, verify that the clustermesh-apiserver Service has an external IP and that firewall rules allow port 2379 (etcd) between clusters.

Cross-Cluster Services

Once Cluster Mesh is established, you can create global services — Kubernetes Services whose endpoints span multiple clusters. A pod in cluster1 calling the Service DNS name will be load-balanced across backends in both clusters.

Global Service Annotation

apiVersion: v1
kind: Service
metadata:
  name: api-server
  annotations:
    service.cilium.io/global: "true"
    service.cilium.io/shared: "true"
spec:
  selector:
    app: api-server
  ports:
    - port: 80
      targetPort: 8080

Deploy this Service definition in both clusters. The service.cilium.io/global: "true" annotation tells Cilium to merge endpoints from all connected clusters. The service.cilium.io/shared: "true" annotation means this cluster's backends are shared with remote clusters.

Service Affinity

By default, global services distribute traffic across all backends equally. Use the affinity annotation to prefer local backends:

apiVersion: v1
kind: Service
metadata:
  name: api-server
  annotations:
    service.cilium.io/global: "true"
    service.cilium.io/shared: "true"
    service.cilium.io/affinity: "local"
spec:
  selector:
    app: api-server
  ports:
    - port: 80
      targetPort: 8080

With service.cilium.io/affinity: "local", traffic stays within the local cluster when backends are healthy. If all local backends are down, Cilium automatically fails over to remote cluster backends — providing high availability without manual intervention.

Use case: Deploy the same microservice in two regions. With affinity: "local", each region handles its own traffic for low latency, while the other region acts as a failover target.

BGP Control Plane

Cilium's BGP control plane enables nodes to peer with external BGP routers and advertise routes for Service LoadBalancer IPs, PodCIDRs, or both. This eliminates the need for MetalLB or cloud-specific load balancer integrations — Cilium handles IP advertisement natively via eBPF.

What BGP Advertises

  • LoadBalancer Service IPs — External IPs assigned to Services of type LoadBalancer
  • PodCIDRs — The pod network range allocated to each node, enabling direct pod routing
  • ClusterIPs — Optionally advertise ClusterIP ranges for external access

Enable BGP

Enable the BGP control plane during Cilium installation or upgrade:

# Helm values for BGP
bgpControlPlane:
  enabled: true
# Or via Cilium CLI
cilium install --set bgpControlPlane.enabled=true

# Or upgrade an existing installation
cilium upgrade --set bgpControlPlane.enabled=true

Enabling the BGP control plane starts a BGP speaker on each selected node. The speaker uses GoBGP under the hood and is configured entirely through Kubernetes CRDs — no configuration files on the nodes are needed.

CiliumBGPPeeringPolicy

The CiliumBGPPeeringPolicy CRD defines which nodes peer with which routers, what ASN to use, and what routes to advertise. Each policy targets a set of nodes via nodeSelector and configures one or more virtual routers.

apiVersion: cilium.io/v2alpha1
kind: CiliumBGPPeeringPolicy
metadata:
  name: rack-bgp
spec:
  nodeSelector:
    matchLabels:
      rack: rack-01
  virtualRouters:
    - localASN: 65001
      exportPodCIDR: true
      neighbors:
        - peerAddress: "10.0.0.1/32"
          peerASN: 65000
          connectRetryTimeSeconds: 120
          holdTimeSeconds: 90
          keepAliveTimeSeconds: 30
      serviceSelector:
        matchExpressions:
          - key: somekey
            operator: NotIn
            values: ["never-advertise"]

Field Reference

Field Description
nodeSelector Selects which nodes this peering policy applies to (e.g., nodes in a specific rack or availability zone)
localASN The BGP Autonomous System Number for Cilium's speaker on the selected nodes
exportPodCIDR When true, advertises the node's PodCIDR to BGP peers — enables direct pod routing from external networks
neighbors List of BGP peers (routers) to establish sessions with — includes address, ASN, and timer configuration
peerAddress IP address of the BGP peer router in CIDR notation (/32 for a single host)
peerASN The ASN of the remote BGP router
holdTimeSeconds Maximum time (in seconds) without a keepalive before the session is declared down
serviceSelector Label selector to control which LoadBalancer Services have their IPs advertised (empty = all Services)

Multiple Peers Example

In production, nodes typically peer with two top-of-rack (ToR) switches for redundancy:

apiVersion: cilium.io/v2alpha1
kind: CiliumBGPPeeringPolicy
metadata:
  name: dual-tor-bgp
spec:
  nodeSelector:
    matchLabels:
      kubernetes.io/os: linux
  virtualRouters:
    - localASN: 65010
      exportPodCIDR: true
      neighbors:
        - peerAddress: "10.0.0.1/32"
          peerASN: 65000
          holdTimeSeconds: 90
          keepAliveTimeSeconds: 30
        - peerAddress: "10.0.0.2/32"
          peerASN: 65000
          holdTimeSeconds: 90
          keepAliveTimeSeconds: 30
      serviceSelector:
        matchLabels:
          advertise: "external"

With this configuration, only Services labeled advertise: external have their LoadBalancer IPs announced to the ToR switches.

L2 Announcements

For environments without BGP routers — such as bare-metal home labs, development clusters, or flat L2 networks — Cilium provides L2 Announcements. This feature uses ARP (IPv4) or NDP (IPv6) to announce LoadBalancer IPs on the local network segment, similar to what MetalLB does in L2 mode.

CiliumL2AnnouncementPolicy

apiVersion: cilium.io/v2alpha1
kind: CiliumL2AnnouncementPolicy
metadata:
  name: l2-policy
spec:
  interfaces:
    - ^eth[0-9]+
  externalIPs: true
  loadBalancerIPs: true
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker: ""

Field Reference

  • interfaces — Regex patterns matching which network interfaces should respond to ARP requests for virtual IPs
  • externalIPs — Whether to announce Service externalIPs via L2
  • loadBalancerIPs — Whether to announce LoadBalancer IPs via L2
  • nodeSelector — Limits which nodes participate in L2 announcements (leader election happens among these nodes)

CiliumLoadBalancerIPPool

L2 announcements require an IP pool from which Cilium allocates LoadBalancer IPs. This replaces MetalLB's address pool configuration:

apiVersion: cilium.io/v2alpha1
kind: CiliumLoadBalancerIPPool
metadata:
  name: main-pool
spec:
  blocks:
    - cidr: "192.168.1.200/28"

This allocates IPs from 192.168.1.200 to 192.168.1.215 (16 addresses). When you create a Service of type LoadBalancer, Cilium assigns an IP from this pool and one node begins responding to ARP requests for that IP.

Complete L2 Setup Example

# 1. Install Cilium with L2 announcements and LB IPAM enabled
cilium install \
  --set l2announcements.enabled=true \
  --set externalIPs.enabled=true \
  --set kubeProxyReplacement=true
# 2. Create the IP pool
apiVersion: cilium.io/v2alpha1
kind: CiliumLoadBalancerIPPool
metadata:
  name: home-lab-pool
spec:
  blocks:
    - cidr: "192.168.1.240/28"
---
# 3. Create the L2 announcement policy
apiVersion: cilium.io/v2alpha1
kind: CiliumL2AnnouncementPolicy
metadata:
  name: default-l2
spec:
  interfaces:
    - ^eth[0-9]+
    - ^eno[0-9]+
    - ^enp[0-9]+s[0-9]+
  externalIPs: true
  loadBalancerIPs: true
# 4. Create a LoadBalancer Service — Cilium assigns an IP from the pool
apiVersion: v1
kind: Service
metadata:
  name: nginx-lb
spec:
  type: LoadBalancer
  selector:
    app: nginx
  ports:
    - port: 80
      targetPort: 80
# 5. Verify the assigned IP
kubectl get svc nginx-lb
# NAME       TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)        AGE
# nginx-lb   LoadBalancer   10.96.123.45    192.168.1.240   80:31234/TCP   10s

# 6. Test from another machine on the same L2 network
curl http://192.168.1.240
L2 vs BGP: L2 announcements work only within a single broadcast domain (same subnet). For multi-subnet or multi-site deployments, use BGP instead. L2 also has a single-node failover model (one leader responds to ARP), while BGP can do ECMP across multiple nodes.

Exercises

Exercise 1: Set up Cluster Mesh between two Kind clusters with Cilium. Create a global Service deployed in both clusters and verify that a pod in cluster1 can reach backends running in cluster2. Use cilium connectivity test --multi-cluster to validate.
Exercise 2: Create a CiliumBGPPeeringPolicy that advertises LoadBalancer IPs to a FRRouting (FRR) container running as a pod or external VM. Verify with vtysh -c "show ip bgp" on the FRR instance that routes appear when you create a LoadBalancer Service.
Exercise 3: Configure L2 announcements with a CiliumLoadBalancerIPPool on a Kind or bare-metal cluster. Create a LoadBalancer Service and verify the external IP is reachable from another machine on the same network. Use arping to confirm ARP responses.

Key Takeaways

  • Cluster Mesh enables pod-to-pod and service-to-service communication across multiple Kubernetes clusters with shared identities and policies
  • Global services use annotations (service.cilium.io/global) to merge endpoints across cluster boundaries with optional local affinity
  • BGP control plane advertises LoadBalancer IPs and PodCIDRs to external routers — eliminating the need for MetalLB in routed environments
  • CiliumBGPPeeringPolicy configures BGP sessions declaratively via CRDs — no node-level configuration files needed
  • L2 announcements provide LoadBalancer IP assignment for bare-metal and flat networks without requiring BGP infrastructure
  • CiliumLoadBalancerIPPool replaces MetalLB address pools, integrating IP allocation directly into Cilium's data plane