Back to Systems Thinking & Architecture Mastery Series

Routing Protocols — BGP, OSPF & IS-IS

May 15, 2026 Wasil Zafar 24 min read

"BGP is the duct tape holding the internet together." — Routing protocols are the pure embodiment of the control plane: they compute the paths that data planes use for forwarding. BGP decides how traffic flows between organizations, OSPF optimizes paths within them, and convergence speed determines how quickly the data plane adapts to failures.

Table of Contents

  1. BGP — The Internet's Control Plane
  2. BGP Path Selection Algorithm
  3. Route Reflectors & Communities
  4. OSPF — Intra-Domain Link-State
  5. OSPF Areas & SPF Algorithm
  6. IS-IS — ISP Backbone Protocol
  7. From Protocols to Forwarding
  8. Convergence & Route Stability
  9. BGP Security — Hijacking & RPKI
  10. Modern BGP in Data Centers

BGP — The Internet's Control Plane

BGP (Border Gateway Protocol) is THE control plane protocol of the internet. It's the mechanism by which ~75,000 autonomous systems (AS) — each operated by a different organization — agree on how to reach every network prefix on the planet. Without BGP, the internet would be a disconnected collection of isolated networks.

Scale of BGP: The global BGP routing table contains ~950,000 IPv4 prefixes and ~200,000 IPv6 prefixes (as of 2026). Every BGP router must evaluate, store, and propagate this information. BGP is a path-vector protocol that makes routing decisions based on AS path length, policy, and attributes — not simple metrics like hop count.

Key BGP characteristics as a control plane protocol:

  • Inter-AS routing — Operates between autonomous systems (organizations), not within them
  • Policy-driven — Routing decisions based on business relationships (customer, peer, provider), not just shortest path
  • Path-vector — Carries the full AS path to each destination, enabling loop detection and policy filtering
  • TCP-based — Runs over TCP port 179, ensuring reliable delivery of routing updates
  • Incremental updates — Only sends changes (withdrawals/announcements), not the full table

BGP Path Selection Algorithm

When BGP receives multiple paths to the same prefix, it applies a strict decision process to select the single best path that gets installed in the RIB:

BGP Path Selection Flow
flowchart TD
    A[Multiple Paths\nto Same Prefix] --> B{Highest\nLocal Preference?}
    B -->|Tie| C{Shortest\nAS Path?}
    B -->|Winner| Z[Install in RIB]
    C -->|Tie| D{Lowest\nOrigin Type?}
    C -->|Winner| Z
    D -->|Tie| E{Lowest\nMED?}
    D -->|Winner| Z
    E -->|Tie| F{eBGP over\niBGP?}
    E -->|Winner| Z
    F -->|Tie| G{Lowest\nIGP Metric\nto Next-Hop?}
    F -->|Winner| Z
    G -->|Tie| H{Oldest\nRoute?}
    G -->|Winner| Z
    H -->|Tie| I{Lowest\nRouter ID?}
    H -->|Winner| Z
    I --> Z
                            
The hierarchy matters: Local Preference (set by the local operator's policy) trumps everything. This is how operators implement business decisions — "prefer customer routes over peer routes over provider routes." AS path length is only the SECOND criterion, meaning policy always wins over shortest path.
# BGP neighbor configuration (Cisco IOS)
# Establishing BGP peering sessions — pure control plane activity

router bgp 65001
 # Router's own AS number
 bgp router-id 1.1.1.1
 
 # eBGP neighbor (external — different AS)
 neighbor 203.0.113.1 remote-as 65002
 neighbor 203.0.113.1 description "Upstream ISP - Transit Provider"
 neighbor 203.0.113.1 password SECRET123
 neighbor 203.0.113.1 update-source Loopback0
 
 # iBGP neighbor (internal — same AS)
 neighbor 10.0.0.2 remote-as 65001
 neighbor 10.0.0.2 description "iBGP peer - Core Router 2"
 neighbor 10.0.0.2 next-hop-self
 
 # Address family configuration
 address-family ipv4 unicast
  # Advertise our networks to the world
  network 198.51.100.0 mask 255.255.255.0
  network 198.51.101.0 mask 255.255.255.0
  
  # Apply route-map policy to neighbor
  neighbor 203.0.113.1 route-map UPSTREAM-IN in
  neighbor 203.0.113.1 route-map UPSTREAM-OUT out
 exit-address-family

# Route-map for policy enforcement
route-map UPSTREAM-IN permit 10
 # Set local preference for routes from this provider
 set local-preference 100
 # Tag with community for downstream policy
 set community 65001:100 additive

route-map UPSTREAM-OUT permit 10
 # Only advertise our own prefixes upstream
 match ip address prefix-list OUR-PREFIXES

Route Reflectors & Communities

Route Reflectors

In iBGP, every router must peer with every other router (full mesh) to ensure all routers see all routes. With N routers, that's N×(N-1)/2 sessions. For 100 routers = 4,950 sessions. Route reflectors solve this by acting as centralized route distribution points — a mini control plane within the control plane.

BGP Communities

Communities are tags attached to routes that enable policy signaling between ASes. They're the "metadata" of the control plane:

  • 65001:100 — "Learned from customer" (prefer highly)
  • 65001:200 — "Learned from peer" (normal preference)
  • 65001:300 — "Learned from transit provider" (least preferred)
  • NO_EXPORT — "Don't advertise outside this AS"
  • NO_ADVERTISE — "Don't advertise to any peer"

OSPF — Intra-Domain Link-State

While BGP handles inter-AS routing, OSPF (Open Shortest Path First) handles routing within a single organization. It's a link-state protocol — every router maintains a complete topology map of the network and independently computes shortest paths using Dijkstra's SPF algorithm.

Protocol Comparison
BGP vs OSPF — Different Control Plane Roles

BGP is a policy protocol — it selects paths based on business relationships and operator preferences. It converges slowly (seconds to minutes) but handles 950K+ prefixes globally. OSPF is an optimization protocol — it finds the mathematically shortest path through a network. It converges fast (sub-second with tuning) but operates within a single administrative domain. Together, they populate the RIB that becomes the FIB for the data plane.

BGP OSPF Convergence

OSPF Areas & SPF Algorithm

OSPF Area Hierarchy
flowchart TB
    subgraph AREA0["Area 0 (Backbone)"]
        ABR1[ABR 1\nArea Border Router]
        ABR2[ABR 2\nArea Border Router]
        BBR1[Backbone\nRouter 1]
        BBR2[Backbone\nRouter 2]
        ABR1 --- BBR1
        BBR1 --- BBR2
        BBR2 --- ABR2
        ABR1 --- ABR2
    end
    subgraph AREA1["Area 1 (Engineering)"]
        R1[Router 1]
        R2[Router 2]
        R3[Router 3]
        R1 --- R2
        R2 --- R3
    end
    subgraph AREA2["Area 2 (Data Center)"]
        R4[Router 4]
        R5[Router 5]
        R6[Router 6]
        R4 --- R5
        R5 --- R6
    end
    ABR1 --- R1
    ABR2 --- R4
                            

OSPF divides large networks into areas to limit the scope of the link-state database and SPF calculations:

  • Area 0 (Backbone) — All areas must connect to Area 0. Inter-area traffic transits through it
  • Regular areas — Contain a complete LSDB of their own topology, receive summarized routes from other areas
  • ABRs (Area Border Routers) — Sit between areas, summarize and redistribute routes
  • DR/BDR (Designated/Backup DR) — Elected on multi-access segments to reduce flooding overhead
# OSPF area configuration
# Each area maintains its own link-state database

router ospf 1
 router-id 1.1.1.1
 
 # Interfaces in Area 0 (backbone)
 network 10.0.0.0 0.0.0.255 area 0
 network 10.0.1.0 0.0.0.255 area 0
 
 # Interfaces in Area 1
 network 10.1.0.0 0.0.255.255 area 1
 
 # Summarize Area 1 routes at the ABR
 area 1 range 10.1.0.0 255.255.0.0
 
 # Fast convergence tuning
 timers throttle spf 50 200 5000
 # Initial SPF delay: 50ms
 # Min hold between SPF runs: 200ms
 # Max hold: 5000ms
 
 # Sub-second failure detection with BFD
 interface GigabitEthernet0/0
  ip ospf bfd
  ip ospf dead-interval minimal hello-multiplier 4
  # Dead interval = 1 second (4 x 250ms hellos)

SPF Algorithm (Dijkstra's)

Each OSPF router runs SPF independently on its link-state database to compute a shortest-path tree rooted at itself. The output is a set of (destination, next-hop, cost) tuples that are installed in the RIB. When a link fails:

  1. Adjacent router detects failure (dead timer expires or BFD triggers)
  2. Router floods an updated LSA (Link-State Advertisement) into the area
  3. All routers in the area receive the LSA and update their LSDB
  4. All routers independently re-run SPF on the updated topology
  5. New best routes are installed in RIB → pushed to FIB → data plane adapts

IS-IS — ISP Backbone Protocol

IS-IS (Intermediate System to Intermediate System) is functionally similar to OSPF — both are link-state protocols that use SPF. IS-IS is preferred by many large ISPs because:

  • Protocol-agnostic — Runs directly on Layer 2 (not IP), so it works even when IP is misconfigured
  • TLV extensibility — Type-Length-Value encoding makes it easy to add new features without protocol redesign
  • Proven at massive scale — Backbone networks with thousands of routers
  • Simpler area design — Level 1 (intra-area) and Level 2 (inter-area) with fewer restrictions than OSPF
  • Multi-topology support — Can run different topologies for IPv4 and IPv6 simultaneously
Industry usage: OSPF dominates enterprise networks. IS-IS dominates ISP/carrier backbones. The choice is often cultural/historical rather than purely technical. Both produce the same output: a shortest-path tree used to populate the RIB → FIB.

From Protocols to Forwarding

Multiple routing protocols may provide routes to the same destination. The router selects the best using administrative distance — a protocol trustworthiness ranking:

Protocol Admin Distance Role
Connected 0 Directly attached networks
Static 1 Manually configured routes
eBGP 20 External BGP (inter-AS)
OSPF 110 Internal link-state
IS-IS 115 Internal link-state
iBGP 200 Internal BGP (same AS)

After administrative distance selects the protocol, the winning route enters the RIB. From there, it's programmed into the FIB — the hardware forwarding table used by the data plane for every packet decision.

Convergence & Route Stability

Convergence Timeline
flowchart LR
    A[Link Failure\nOccurs] -->|"Detection\n(BFD: 50ms\nHellos: 30-40s)"| B[Failure\nDetected]
    B -->|"Flooding\n(10-100ms per hop)"| C[All Routers\nNotified]
    C -->|"SPF Calculation\n(1-50ms)"| D[New Routes\nComputed]
    D -->|"RIB Update\n(1-10ms)"| E[RIB\nUpdated]
    E -->|"FIB Programming\n(10-100ms)"| F[Data Plane\nConverged]
    
    style A fill:#BF092F,color:#fff
    style F fill:#3B9797,color:#fff
                            
Convergence = control plane speed affecting data plane behavior. During convergence, the data plane is forwarding packets based on stale information — routes that pointed to a now-failed link. Packets are being blackholed or looped until the control plane finishes recomputing and reprogramming the FIB. Sub-second convergence (BFD + fast SPF + fast FIB programming) is critical for carrier-grade networks.

Route Flapping and Dampening

When a link oscillates between up and down rapidly (flapping), each state change triggers BGP withdrawals and re-announcements that propagate across the internet. Route dampening suppresses flapping routes by penalizing instability — after too many flaps, the route is suppressed for an exponentially increasing period.

BGP Security — Hijacking & RPKI

BGP was designed in an era of trust. Any AS can announce any prefix — there's no built-in authentication. This makes BGP hijacking a control plane attack with devastating data plane consequences:

Security
BGP Hijacking — A Control Plane Attack

In a BGP hijack, an attacker announces someone else's IP prefix from their own AS. Because BGP has no native origin validation, other routers may accept the malicious announcement and route traffic to the attacker. Notable incidents: Pakistan accidentally hijacked YouTube's prefix (2008), causing a global outage. China Telecom has been accused of routing US traffic through China. The fix — RPKI (Resource Public Key Infrastructure) — adds cryptographic validation of prefix ownership, allowing routers to reject unauthorized announcements.

BGP Hijacking RPKI Trust

Modern BGP in Data Centers

BGP has evolved far beyond its original internet routing role. Modern data centers use eBGP as the only routing protocol in CLOS (leaf-spine) fabrics:

BGP in Data Center CLOS Fabric
flowchart TB
    subgraph SPINE["Spine Layer (eBGP)"]
        S1[Spine 1\nAS 65100]
        S2[Spine 2\nAS 65200]
        S3[Spine 3\nAS 65300]
    end
    subgraph LEAF["Leaf Layer (eBGP)"]
        L1[Leaf 1\nAS 65001]
        L2[Leaf 2\nAS 65002]
        L3[Leaf 3\nAS 65003]
        L4[Leaf 4\nAS 65004]
    end
    subgraph SERVERS["Servers"]
        SV1[Servers]
        SV2[Servers]
        SV3[Servers]
        SV4[Servers]
    end
    L1 --- S1
    L1 --- S2
    L1 --- S3
    L2 --- S1
    L2 --- S2
    L2 --- S3
    L3 --- S1
    L3 --- S2
    L3 --- S3
    L4 --- S1
    L4 --- S2
    L4 --- S3
    SV1 --- L1
    SV2 --- L2
    SV3 --- L3
    SV4 --- L4
                            

Why BGP in the data center (instead of OSPF):

  • Every link is eBGP — Each switch gets its own AS number, making every link an inter-AS link
  • ECMP load balancing — Multiple equal-cost paths through different spines
  • No SPF storms — Link failures don't trigger network-wide recalculation
  • Policy at every hop — Fine-grained traffic engineering capabilities
  • Proven at massive scale — Facebook, Google, Microsoft all use eBGP CLOS

BGP for Kubernetes

BGP is increasingly used to advertise Kubernetes service IPs and pod CIDRs to the physical network:

# MetalLB BGP configuration for Kubernetes
# Advertises LoadBalancer service IPs via BGP to the network fabric
apiVersion: metallb.io/v1beta2
kind: BGPPeer
metadata:
  name: leaf-switch-peer
  namespace: metallb-system
spec:
  myASN: 65010          # Kubernetes cluster's AS
  peerASN: 65001        # Leaf switch's AS
  peerAddress: 10.0.0.1 # Leaf switch IP
  holdTime: 90s
  keepaliveTime: 30s
---
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: production-pool
  namespace: metallb-system
spec:
  addresses:
  - 198.51.100.0/24     # IPs to assign to LoadBalancer services
---
apiVersion: metallb.io/v1beta1
kind: BGPAdvertisement
metadata:
  name: production-advertisement
  namespace: metallb-system
spec:
  ipAddressPools:
  - production-pool
  communities:
  - 65001:100           # Tag as "internal service"
# Verify BGP sessions from a Kubernetes perspective
# Calico uses BGP to distribute pod network routes

# Check BGP peering status (Calico)
calicoctl node status

# Sample output:
# IPv4 BGP status
# +--------------+-------------------+-------+----------+-------------+
# | PEER ADDRESS |     PEER TYPE     | STATE |  SINCE   |    INFO     |
# +--------------+-------------------+-------+----------+-------------+
# | 10.0.0.1     | node-to-node mesh | up    | 08:15:00 | Established |
# | 10.0.0.2     | node-to-node mesh | up    | 08:15:01 | Established |
# | 10.0.0.3     | node-to-node mesh | up    | 08:15:01 | Established |
# +--------------+-------------------+-------+----------+-------------+

# Check BGP summary on the physical leaf switch
show bgp summary

# Sample output:
# Neighbor        AS    MsgRcvd  MsgSent  Up/Down  State/PfxRcd
# 10.0.0.10    65010       1205     1198  2d03h    48
# 10.0.0.11    65010       1180     1175  2d03h    48
# 10.0.0.12    65010       1195     1190  2d03h    48
# Total number of neighbors: 3, Prefixes received: 144
Summary: Routing protocols are the purest control plane components — they exist solely to compute forwarding state for the data plane. BGP handles inter-domain policy routing, OSPF/IS-IS handle intra-domain shortest-path optimization. Their convergence speed directly determines how quickly the data plane recovers from failures. Modern usage extends BGP into data centers (CLOS fabrics) and Kubernetes (MetalLB, Calico) — proving the protocol's remarkable adaptability as a universal control plane mechanism.