BGP and the Internet's Routing Fabric
Topic 22

BGP and the Internet's Routing Fabric

BGP

BGP (Border Gateway Protocol) is how the internet's roughly 78,000 autonomous systems tell each other which networks they can reach. It is a path-vector protocol: each route carries the full list of autonomous systems it has crossed — the AS-path — and decisions are driven by policy, not by shortest hop count or lowest cost. BGP is the one protocol that makes a single global internet out of tens of thousands of independently-run networks.

It is also the protocol whose mistakes make the news. Because any AS can announce reachability for a prefix, a single bad announcement — fat-fingered or malicious — can pull traffic for someone else's networks across the planet and into a black hole. BGP runs the internet and breaks the internet with the same mechanism: it trusts what its neighbors announce, and that trust is exactly what gets abused in route leaks and hijacks.

An AS-path across autonomous systems to the destination prefix
AS65001
origin
AS174
transit
AS3356
transit
dest
prefix

Autonomous Systems and Peering

An autonomous system is a network under one administrative authority — an ISP, a cloud provider, a large enterprise — identified by a globally unique AS number (ASN). The internet is the mesh of these ASes exchanging routes with each other. Two ASes connect in one of two business shapes: transit, where one pays another to reach the rest of the internet, and peering, where two networks exchange their own and their customers' routes for mutual benefit, usually settlement-free.

That money relationship is the policy. A network advertises customer routes to everyone (it wants traffic to paying customers) but does not advertise its expensive transit-learned routes to peers (it will not pay to carry a peer's traffic). These valley-free routing rules — not any shortest-path metric — decide where most internet traffic actually flows, which is why two geographically close hosts can take a wildly indirect path between them.

Path-Vector and the AS-Path

Every BGP route carries its AS-path: the ordered list of AS numbers the announcement traversed to reach you. This solves loop prevention cleanly — a router rejects any route whose AS-path already contains its own ASN, because that means the route looped back. No counting-to-infinity, no hold-down timers; the path is right there in the advertisement to inspect.

The AS-path is also a selection input, but it is far from the only one. BGP's decision process consults local preference (an operator's explicit policy knob) before it ever looks at AS-path length, so a longer path can absolutely win if policy prefers it. This is the heart of why "BGP picks the policy-preferred route, not the fastest one" — AS-path length is a tie-breaker well down the list, beneath the levers operators set deliberately.

# a BGP route as seen on a router: prefix, next-hop, and the AS-path
show ip bgp 93.184.216.0/24
# Network          Next Hop      Path
# 93.184.216.0/24  203.0.113.1   64500 174 15133 i
#                                ^your-peer ^transit ^origin-AS
# a router seeing its own ASN already in this path rejects the route as a loop

eBGP versus iBGP

eBGP (external BGP) runs between different autonomous systems — the sessions that actually stitch the internet together. When a route crosses an eBGP session, the sending AS prepends its own ASN to the AS-path, which is what makes the path grow and loop prevention work. eBGP peers are typically directly connected, and an eBGP-learned route carries a low administrative distance (20 by default on Cisco) because it represents real inter-domain reachability.

iBGP (internal BGP) runs within a single AS, carrying externally-learned routes between that AS's own routers without changing the AS-path. iBGP has a notorious constraint: a route learned from one iBGP peer is not re-advertised to another, to prevent loops inside the AS — which forces either a full mesh of iBGP sessions or a route reflector to relay them. Forgetting this rule is a classic way to end up with routers inside one AS that simply never learn a prefix.

Route Leaks and Hijacks

A route hijack is announcing a prefix you do not own; a route leak is re-advertising routes in a direction policy forbids (say, passing transit routes between two peers). Either can redirect huge volumes of traffic to the wrong AS — sometimes briefly capturing it, sometimes black-holing it entirely. Because BGP trusts announcements by default, the damage is global before anyone notices.

The deployed mitigation is RPKI (Resource Public Key Infrastructure): cryptographically signed records (ROAs) that state which AS is authorized to originate a given prefix. A router doing origin validation can drop an announcement whose origin AS does not match the signed authorization, defeating the simplest hijacks. RPKI validates the origin, not the whole path, so it is a partial defense — but it is the one large networks actually deploy.

BGP vs OSPF

BGP is path-vector, internet-scale, and policy-driven: it carries the full AS-path, scales to the global table of over a million prefixes, and chooses routes by business policy rather than a shortest-path metric. Run it between autonomous systems — anywhere a route crosses an administrative boundary.

OSPF is link-state, intra-organization, and metric-driven: every router shares one map and runs Dijkstra for the shortest cost path. You do not run OSPF between companies — it has no notion of policy or trust boundaries, would try to flood another company's topology into yours, and does not scale to the internet's table.

Common Mistakes
  • Announcing prefixes you do not own, through a leak or a fat-fingered redistribution. The bad route propagates globally in seconds and pulls other networks' traffic toward you, black-holing it until the announcement is withdrawn.
  • Running no prefix filtering on a peer or customer session. You accept whatever they announce — including routes they should never originate — and pass the damage along, turning your AS into an amplifier for their mistake.
  • Assuming BGP picks the fastest or shortest path. It picks the policy-preferred one: local preference and business relationships outrank AS-path length, so traffic routinely takes a longer path than geography would suggest.
  • Letting a prefix flap repeatedly, triggering route-flap dampening. Peers suppress the unstable prefix for a growing penalty window, so your network stays unreachable from parts of the internet long after the underlying link stabilizes.
  • Building iBGP without a full mesh or a route reflector. Because iBGP does not re-advertise a route from one internal peer to another, some routers in your AS silently never learn the prefix and black-hole traffic to it.
Best Practices
  • Filter prefixes strictly on every eBGP session — accept only what a peer or customer is authorized to announce, and announce only what you own — so one neighbor's mistake cannot leak through your AS.
  • Deploy RPKI origin validation and drop or de-prefer invalid routes, defeating the simplest hijacks where an unauthorized AS originates a prefix it does not hold.
  • Use local preference and AS-path policy deliberately to engineer traffic, rather than expecting BGP's defaults to choose the path you want — the protocol optimizes for policy, not latency.
  • Scale iBGP with route reflectors instead of a manual full mesh once an AS grows past a handful of routers, so every internal router reliably learns every external prefix.
  • Stabilize flapping sessions with BFD and hold timers before route-flap dampening penalizes the prefix, since a dampened prefix stays unreachable far longer than the original fault lasted.
Comparable conceptsOSPF / IS-IS (interior counterparts)RPKI (origin validation)

Knowledge Check

How does BGP prevent routing loops between autonomous systems?

  • A router rejects any route whose AS-path already contains its own AS number
  • Each router decrements the packet TTL field until any looping route eventually expires
  • Hold-down timers freeze a prefix until every router agrees it is gone
  • Every router runs Dijkstra on a shared map and discards looping paths

Why is it accurate to say BGP picks the policy-preferred route rather than the fastest one?

  • Local preference and business policy outrank AS-path length in the decision process
  • BGP measures the round-trip latency of each available path and then deliberately picks the slowest
  • The shortest AS-path always wins, and the shortest path is rarely the fastest
  • A central registry assigns each AS a fixed preferred path for every prefix

What does RPKI defend against, and what is its limitation?

  • It validates the origin AS of a prefix, stopping origin hijacks but not whole-path forgery
  • It encrypts every BGP session end to end so route announcements cannot be read by anyone in transit
  • It cryptographically verifies every AS in the path, blocking all leaks
  • It suppresses flapping prefixes by signing their stability over time

You got correct