What is Border Gateway Protocol (BGP)?

Photo by JJ Ying on Unsplash

What is Border Gateway Protocol (BGP)?

The Border Gateway Protocol (BGP) is the internet's routing protocol, often referred to as the "internet's mailman" for its role in directing data traffic. Imagine the internet as a vast network of roads, and BGP as the GPS that chooses the most efficient route for your data packets to travel. BGP is critical for the internet's functionality because it determines the paths data travels across numerous interconnected networks. It ensures data gets from its source to the intended destination efficiently.

Image source: Cloudflare

BGP was first introduced in 1989 and has gone through several versions, with BGP-4 being the most widely used version on the internet today.

The Border Gateway Protocol (BGP) operates at Layer 4 (the Transport Layer) of the OSI (Open Systems Interconnection) model. BGP uses TCP (Transmission Control Protocol) as its transport protocol to ensure reliable delivery of its messages between peers. Specifically, BGP uses TCP port 179 for establishing connections between BGP routers.

While BGP itself functions at the Transport Layer, its primary purpose is to exchange routing information, which pertains to Layer 3 (the Network Layer) of the OSI model. Thus, BGP is a unique example of a protocol that operates at one layer but primarily serves the functions of another layer in the OSI model.

Autonomous Systems (AS)

BGP exchanges information between different networks, called autonomous systems (AS), about which networks they can reach and how to get there.

Image source: Cloudflare

In essence, an Autonomous System (AS) is a collection of connected IP networks and routers under the control of one or more network operators that presents a common routing policy to the internet. An AS is a large network or group of networks that has a single and clearly defined external routing policy. This means that the network operates independently from other autonomous systems with regard to its internal routing decisions.

Each Autonomous System is assigned an Autonomous System Number (ASN), which is used for identification and BGP routing. ASNs are unique identifiers allocated to each AS, allowing them to exchange routing information with other autonomous systems via the Border Gateway Protocol (BGP). The ASNs are essential for managing the routing of information across the internet and ensuring data packets find the most efficient paths to their destinations.

Autonomous Systems are typically managed by internet service providers (ISPs), large organizations, universities, government agencies, or any entity that controls a substantial amount of IP addresses and has a significant presence on the internet. The concept of an AS is fundamental to the scalability of the internet, allowing for efficient organization, management, and routing of inter-network communications.

The management and policies of an AS are defined by the network administrators of the organization that owns the AS. These policies determine how the AS interacts with other autonomous systems to exchange data, including how it announces its presence to other networks and how it selects the best routes for sending and receiving data from those networks. The policies can be based on various factors, including technical requirements, business considerations, and agreements with other networks.

The assignment and management of ASNs are overseen by five Regional Internet Registries (RIRs) across the globe.

These RIRs are responsible for distributing IP addresses and ASNs within their designated regions, ensuring that each AS has a unique identifier and facilitating the global coordination necessary for internet routing.

How BGP works

BGP is a path-vector routing protocol. This means routers share information about the paths they can take to reach specific destinations, along with certain attributes like network congestion. BGP routers then use these attributes to choose the best path for data to travel.

Types of BGP:

  • Exterior BGP (EBGP): The standard BGP used for communication between different autonomous systems on the internet.

  • Interior BGP (IBGP): A less common variant used for sharing routing information within a single autonomous system.

Image source: kwtrain.com

Facebook outage due to BGP misconfiguration

The Facebook outage that occurred on October 4, 2021, is a significant example of how BGP (Border Gateway Protocol) can impact the accessibility of global internet services. This incident resulted in Facebook, Instagram, WhatsApp, and other services owned by Facebook Inc. (now Meta Platforms, Inc.) becoming inaccessible to users worldwide for several hours. The root cause of this outage was related to BGP and DNS (Domain Name System) configurations.

Image source: Wikipedia

How it Happened

  1. BGP Withdrawal: The outage was initiated by a change in the configuration of BGP routes that Facebook advertised to the rest of the internet. Specifically, Facebook inadvertently withdrew the BGP routes that internet routers use to find the company's servers. Without these routes, Facebook's servers became unreachable, essentially making them invisible on the internet.

  2. DNS Impact: Alongside the BGP issue, the DNS servers that translate human-friendly domain names like 'facebook.com' into IP addresses that computers use to communicate were also unreachable. This was because the BGP routes to these DNS servers were withdrawn as well. Even though the servers were up and running, the internet at large could not find them because the "map" provided by BGP had effectively erased them from the internet.

The Role of BGP

BGP plays a critical role in the functioning of the internet by directing traffic through the various networks that make up the global internet. Each autonomous system (AS) on the internet uses BGP to advertise the IP addresses it hosts and the best paths for reaching those addresses. When BGP updates are correctly propagated, traffic flows smoothly. However, if BGP routes are withdrawn or incorrectly advertised, it can lead to outages or misrouted traffic.

The Aftermath and Response

The Facebook outage underscored the importance of careful management of BGP configurations and the potential cascading effects that errors can have on global internet services. In response to the outage, Facebook had to deploy a team to its data centers to reset the BGP configurations manually. This incident highlighted the fragility of internet infrastructure and the critical role that BGP plays in maintaining the connectivity and resilience of online services.

The Facebook outage serves as a case study in network management, emphasizing the need for robust protocols, careful change management, and the potential for significant impact on businesses and users when things go wrong at the network level, especially concerning BGP and DNS configurations.

References:

  1. Cloudflare blog: What is BGP?

  2. Cloudflare blog: What is an autonomous system?

  3. YouTube Computerphile: BGP

  4. YouTube: BGP Overview