4.5: Routing in the Internet

Having studied Internet addressing and the IP protocol, we now turn our attention to the Internet's routing protocols; their job is to determine the path taken by a datagram between source and destination. We'll see that the Internet's routing protocols embody many of the principles we learned earlier in this chapter. The link state and distance vector approaches studied in Section 4.2, and the notion of an autonomous system (AS) considered in Section 4.3 are all central to how routing is done in today's Internet. 

Informally, we know that the Internet is a loose confederation of interconnected "networks" of local, regional, national, and international ISPs. We'll now need to make this understanding a bit more precise, given that we've seen that the notion of "network" has a very precise meaning in terms of IP addressing. Recall from Section 4.3 that a collection of routers that are under the same administrative and technical control, and that all run the same routing protocol amongst themselves, is known as an autonomous system (AS). Each AS, in turn, is typically comprised of multiple networks (where we use the term "network" in the precise, addressing sense of Section 4.4). The most important distinction to be made among Internet routing protocols is whether they are used to route datagrams within a single AS, or among multiple ASs. We consider the first class of protocols, known as intra-autonomous system routing protocols, in Section 4.5.1. We consider the second class of protocols, inter-autonomous system routing protocols, in Section 4.5.2. 

4.5.1: Intra-Autonomous-System Routing in the Internet

An intra-AS routing protocol is used to configure and maintain the routing tables within an autonomous system (AS). Intra-AS routing protocols are also known as interior gateway protocols. Historically, three routing protocols have been used extensively for routing within an autonomous system in the Internet: RIP (the Routing Information Protocol), OSPF (Open Shortest Path First), and EIGRP (Cisco's propriety Enhanced Interior Gateway Routing Protocol). 

RIP: Routing Information Protocol

The Routing Information Protocol (RIP) was one of the earliest intra-AS Internet routing protocols and is still in widespread use today. It traces its origins and its name to the Xerox Network Systems (XNS) architecture. The widespread deployment of RIP was due in great part to its inclusion in 1982 of the Berkeley Software Distribution (BSD) version of UNIX supporting TCP/IP. RIP version 1 is defined in RFC 1058, with a backwards compatible version 2 defined in RFC 1723. 

RIP is a distance vector protocol that operates in a manner very close to the idealized protocol we examined in Section 4.2.3. The version of RIP specified in RFC 1058 uses hop count as a cost metric; that is, each link has a cost of 1. The maximum cost of a path is limited to 15, thus limiting the use of RIP to autonomous systems that are less than 15 hops in diameter. Recall that in distance vector protocols, neighboring routers exchange routing information with each other. In RIP, routing tables are exchanged between neighbors approximately every 30 seconds using a so-called RIP response message. The response message sent by a router or host contains the sender's routing table entries for up to 25 destination networks within the AS. Response messages are also known as RIP advertisements.

Let's take a look at a simple example of how RIP advertisements work. Consider the portion of an AS shown in Figure 4.27. In this figure, lines connecting the routers denote networks. Only selected routers (A,B,C, and D) and networks (w,x,y,z) are labeled for visual convenience. Dotted lines indicate that the AS continues on, and thus this autonomous system has many more routers and links than are shown in the Figure 4.27. 

Figure 4.27
Figure 4.27: A portion of an autonomous system

Now suppose that the routing table for router D is as shown in Figure 4.28. Note that the routing table has three columns. The first column is for the destination network, the second column indicates the identity of the next router along the shortest path to the destination network, and the third column indicates the number of hops (that is, the number of networks that have to be traversed, including the destination network) to get to the destination network along the shortest path. For this example, the table indicates that to send a datagram from router D to destination network w, the datagram should be first sent to neighboring router A; the table also indicates that destination network w is two hops away along the shortest path. Similarly, the table indicates that network z is seven hops away via router B. In principle, a routing table will have one row for each network in the AS, although RIP version 2 allows routes to networks to be aggregated using the route aggregation techniques similar to those we examined in Section 4.4.1. The routing table will also have at least one row for routing to networks that are outside of the AS. The table in Figure 4.28, and the subsequent tables to come, are thus only partially complete. 

Destination network Next Router Number of Hops to Destination
W A 2
Y B 2
Z B 7
X -- 1
.... .... ....

Figure 4.28: Routing table in router D before receiving advertisement from router A

Now suppose that 30 seconds later, router D receives from router A the advertisement shown in Figure 4.29. Note that this advertisement is nothing other but the routing table in router A! This routing table says, in particular, that network z is only 4 hops away from router A. Router D, upon receiving this advertisement, merges the advertisement (Figure 4.29) with the "old" routing table (Figure 4.28). In particular, router D learns that there is now a path through router A to network z that is shorter than the path through router B

Destination network Next Router Number of Hops to Destination
Z C 4
W -- 1
X -- 1
.... .... ....

Figure 4.29: Advertisement from router A

Thus, router D updates its routing table to account for the "shorter" shortest path, as shown in Figure 4.30. How is it, you might ask, that the shortest path to network z has become shorter? Possibly, the decentralized distance vector algorithm was still in the process of converging (see Section 4.2), or perhaps new links and/or routers were added to the AS, thus changing the shortest paths in the network. 

Destination network Next Router Number of Hops to Destination
W A 2
Y B 2
Z A 5
.... .... ....

Figure 4.30: Routing table in router D after receiving advertisement from router A

Let's next consider a few of the implementation aspects of RIP. Recall that RIP routers exchange advertisements approximately every 30 seconds. If a router does not hear from its neighbor at least once every 180 seconds, that neighbor is considered to be no longer reachable; that is, either the neighbor has died or the connecting link has gone down. When this happens, RIP modifies its local routing table and then propagates this information by sending advertisements to its neighboring routers (the ones that are still reachable). A router can also request information about its neighbor's cost to a given destination using RIP's request message. Routers send RIP request and response messages to each other over UDP using port number 520. The UDP packet is carried between routers in a standard IP packet. The fact that RIP uses a transport-layer protocol (UDP) on top of a network-layer protocol (IP) to implement network-layer functionality (a routing algorithm) may seem rather convoluted (it is!). Looking a little deeper at how RIP is implemented will clear this up. 

Figure 4.31 sketches how RIP is typically implemented in a UNIX system, for example, a UNIX workstation serving as a router. A process called routed (pronounced "route dee") executes the RIP protocol, that is, maintains the routing table and exchanges messages with routed processes running in neighboring routers. Because RIP is implemented as an application-layer process (albeit a very special one that is able to manipulate the routing tables within the UNIX kernel), it can send and receive messages over a standard socket and use a standard transport protocol. Thus, RIP is an application-layer protocol (see Chapter 2) running over UDP. 

Figure 4.31
Figure 4.31: Implementation of RIP as the routed daemon

Finally, let us take a quick look at a RIP routing table. The RIP routing table in Figure 4.32 is taken from a UNIX router giroflee.eurecom.fr. If you give a netstat -rn command on a UNIX system, you can view the routing table for that host or router. Performing a netstat command on giroflee.eurecom.fr yields the routing table in Figure 4.32. 

Destination   Gateway         Flags  Ref   Use     Interface

------------  --------------  -----  ----  ------  --------

127.0.0.1     127.0.0.1       UH     0     26492   1o0

192.168.2.    192.168.2.5     U      2     13      fa0

193.55.114.   193.55.114.6    U      3     58503   le0

192.168.3.    192.168.3.5     U      2     25      qaa0

224.0.0.0     193.55.114.6    U      3     0       le0

default       193.55.114.129  UG     0     143454
Figure 4.32: RIP routing table from giroflee.eurecom.fr

The router giroflee is connected to three networks. The second, third, and fourth rows in the table tell us that these three networks are attached to giroflee via giroflee's network interfaces fa0, le0 and qaa0. These giroflee interfaces have IP addresses 192.168.2.5, 193.55.114.6, and 192.168.3.5, respectively. To transmit a packet to any host belonging to one of these three networks, giroflee will simply send the outgoing IP datagram over the appropriate interface. Of particular interest to us is the default route. Any IP datagram that is not destined for one of the networks explicitly listed in the routing table will be forwarded to the router with IP address 193.55.114.129; this router is reached by sending the datagram over the default network interface. The first entry in the routing table is the so-called loopback interface. When IP sends a datagram to the loopback interface, the packet is simply returned back to IP; this is useful for debugging purposes. The address 224.0.0.0 is a special multicast (Class D) IP address. We will examine IP multicast in Section 4.8. 

OSPF: Open Shortest Path First

Like RIP, Open Shortest Path First (OSPF) routing is used for intra-AS routing. The "Open" in OSPF indicates that the routing protocol specification is publicly available (for example, as opposed to Cisco's EIGRP protocol). The most recent version of OSPF, version 2, is defined in RFC 2178--a public document. 

OSPF was conceived as the successor to RIP and as such has a number of advanced features. At its heart, however, OSPF is a link-state protocol that uses flooding of link-state information and a Dijkstra least-cost-path algorithm. With OSPF, a router constructs a complete topological map (that is, a directed graph) of the entire autonomous system. The router then locally runs Dijkstra's shortest-path algorithm to determine a shortest-path tree to all networks with itself as the root node. The router's routing table is then obtained from this shortest-path tree. Individual link costs are configured by the network administrator. 

Let us now contrast and compare the advertisements sent by RIP and OSPF. With OSPF, a router periodically broadcasts routing information to all other routers in the autonomous system, not just to its neighboring routers. For an excellent treatment of link state broadcast algorithms, see [Perlman 1999]. This routing information sent by a router has one entry for each of the router's neighbors; the entry gives the distance (that is, link state) from the router to the neighbor. On the other hand, a RIP advertisement sent by a router contains information about all the networks in the autonomous system, although this information is only sent to its neighboring routers. In a sense, the advertising techniques of RIP and OSPF are duals of each other. 

Some of the advances embodied in OSPF include the following: 

  • Security. All exchanges between OSPF routers (for example, link-state updates) are authenticated. This means that only trusted routers can participate in the OSPF protocol within a domain, thus preventing malicious intruders (or networking students taking their newfound knowledge out for a joyride) from injecting incorrect information into router tables. 
  • Multiple same-cost paths. When multiple paths to a destination have the same cost, OSPF allows multiple paths to be used (that is, a single path need not be chosen for carrying all traffic when multiple equal-cost paths exist). 
  • Different cost metrics for different TOS traffic. OSPF allows each link to have different costs for different TOS (type of service) IP packets. For example, a high-bandwidth satellite link might be configured to have a low cost (and hence be attractive) for non-time-critical traffic, but a very high cost metric for delay-sensitive traffic. In essence, OSPF sees different network topologies for different classes of traffic and hence can compute different routes for each type of traffic. 
  • Integrated support for unicast and multicast routing. Multicast OSPF [RFC 1584] provides simple extensions to OSPF to provide for multicast routing (a topic we cover in more depth in Section 4.8). MOSPF uses the existing OSPF link database and adds a new type of link-state advertisement to the existing OSPF link-state broadcast mechanism. 
  • Support for hierarchy within a single routing domain. Perhaps the most significant advance in OSPF is the ability to hierarchically structure an autonomous system. Section 4.3 has already looked at the many advantages of hierarchical routing structures. We cover the implementation of OSPF hierarchical routing in the remainder of this section.
As OSPF autonomous system can be configured into "areas." Each area runs its own OSPF link-state routing algorithm, with each router in an area broadcasting its link state to all other routers in that area. The internal details of an area thus remain invisible to all routers outside the area. Intra-area routing involves only those routers within the same area. 

Within each area, one or more area border routers are responsible for routing packets outside the area. Exactly one OSPF area in the AS is configured to be the backbone area. The primary role of the backbone area is to route traffic between the other areas in the AS. The backbone always contains all area border routers in the AS and may contain nonborder routers as well. Inter-area routing within the AS requires that the packet be first routed to an area border router (intra-area routing), then routed through the backbone to the area border router that is in the destination area, and then routed to the final destination. 

A diagram of a hierarchically structured OSPF network is shown in Figure 4.33. We can identify four types of OSPF routers in Figure 4.33: 

Figure 4.33
Figure 4.33: Hierarchically structured OSPF AS with four areas

  • Internal routers. These routers are in nonbackbone areas and only perform intra-AS routing. 
  • Area border routers. These routers belong to both an area and the backbone. 
  • Backbone routers (nonborder routers). These routers perform routing within the backbone but themselves are not area border routers. Within a nonbackbone area, internal routers learn of the existence of routes to other areas from information (essentially a link-state advertisement, but advertising the cost of a route to another area, rather than a link cost) broadcast within the area by its backbone routers. 
  • Boundary routers. A boundary router exchanges routing information with routers belonging to other autonomous systems. This router might, for example, use BGP to perform inter-AS routing. It is through such a boundary router that other routers learn about paths to external networks.
EIGRP: Enhanced Internal Gateway Routing Protocol

The Enhanced Interior Gateway Routing Protocol (EIGRP) [Cisco IGRP 1997] is a proprietary routing algorithm developed by Cisco Systems, Inc., as a successor for RIP. EIGRP is a distance vector protocol. Several cost metrics (including delay, bandwidth, reliability, and load) can be used in making routing decisions, with the weight given to each of the metrics being determined by the network administrator. This ability to use administrator-defined costs in making route selections is an important difference from RIP; we will see shortly that so-called policy-based inter-AS Internet routing protocols such as BGP also allow administratively defined routing decisions to be made. Other important differences from RIP include the use of a reliable transport protocol to communicate routing information, the use of update messages that are sent only when routing table costs change (rather than periodically), route aggregation, and the use of a distributed diffusing update routing algorithm [Garcia-Luna 1993] to quickly compute loop-free routing paths. 

4.5.2: Inter-Autonomous System Routing

The Border Gateway Protocol version 4, specified in RFC 1771 (see also RFC 1772; RFC 1773), is the de facto standard interdomain routing protocol in today's Internet. It is commonly referred to as BGP4 or simply as BGP. As an inter-autonomous system routing protocol, it provides for routing between autonomous systems (that is, administrative domains). 

While BGP has the same general flavor as the distance vector protocol that we studied in Section 4.2, it is more appropriately characterized as a path vector protocol. This is because BGP in a router does not propagate cost information (for example, number of hops to a destination), but instead propagates path information, such as the sequence of ASs on a route to a destination AS. We will examine the path information in detail shortly. We note though that although this information includes the names of the ASs on a route to the destination, it does not contain cost information. Nor does BGP specify how a specific route to a particular destination should be chosen among the routes that have been advertised. That decision is a policy decision that is left up to the domain's administrator. Each domain can thus choose its routes according to whatever criteria it chooses (and need not even inform its neighbors of its policy!)--allowing a significant degree of autonomy in route selection. In essence, BGP provides the mechanisms to distribute path information among the interconnected autonomous systems, but leaves the policy for making the actual route selections up to the network administrator. 

Let's begin with a grossly simplified description of how BGP works. This will help us see the forest through the trees. As discussed in Section 4.3, as far as BGP is concerned, the whole Internet is a graph of ASs, and each AS is identified by an AS number. At any instant of time, a given AS X may, or may not, know of a path of ASs that lead to a given destination AS Z. As an example, suppose X has a path XY1Y2Y3Z from itself to Z listed in its BGP table. This means that X knows that it can send datagrams to Z through the ASs X, Y1, Y2, and Y3, Z. When X sends updates to its BGP neighbors (that is, the neighbors in the graph), X actually sends the entire path information, XY1Y2Y3Z, to its neighbors (as well as other paths to other ASs). If, for example, W is a neighbor of X, and W receives an advertisement that includes the path XY1Y2Y3Z, then W can list a new entry WXY1Y2Y3Z in its BGP table. However, we should keep in mind that W may decide to not create this new entry for one of several reasons. For example, W would not create this entry if W is equal to (say) Y2, thereby creating an undesirable loop in the routing; or if W already has a path to Z in its tables, and this existing path is preferable (with respect to the metric used by BGP at W) to WXY1Y2Y3Z; or, finally, if W has a policy decision to not forward datagrams through (say) Y2

In BGP jargon, the immediate neighbors in the graph of ASs are called peers. BGP information is propagated through the network by exchanges of BGP mes -sages between peers. The BGP protocol defines the four types of messages: OPEN, UPDATE, NOTIFICATION, and KEEPALIVE. 

  • OPEN. BGP peers communicate using the TCP protocol and port number 179. TCP thus provides for reliable and congestion-controlled message exchange between peers. In contrast, recall that we earlier saw that two RIP peers, for example, the routed's in Figure 4.31 communicate via unreliable UDP. When a BGP gateway wants to first establish contact with a BGP peer (for example, after the gateway itself or a connecting link has just been booted), an OPEN message is sent to the peer. The OPEN message allows a BGP gateway to identify and authenticate itself, and provide timer information. If the OPEN is acceptable to the peer, it will send back a KEEPALIVE message. 
  • UPDATE. A BGP gateway uses the UPDATE message to advertise a path to a given destination (for example, XY1Y2Y3Z) to the BGP peer. The UPDATE message can also be used to withdraw routes that had previously been advertised (that is, to tell a peer that a route that it had previously advertised is no longer a valid route). 
  • KEEPALIVE. This BGP message is used to let a peer know that the sender is alive but that the sender doesn't have other information to send. It also serves as an acknowledgment to a received OPEN message. 
  • NOTIFICATION. This BGP message is used to inform a peer that an error has been detected (for example, in a previously transmitted BGP message) or that the sender is about to close the BGP session.
Recall from our discussion above that BGP provides mechanisms for distributing path information but does not mandate policies for selecting a route from those available. Within this framework, it is thus possible for an AS such as Hatfield.net to implement a policy such as "traffic from my AS should not cross the AS McCoy.net," since it knows the identities of all AS's on the path. (The Hatfields and the McCoys are two famous feuding families in the U.S.) But what about a policy that would prevent the McCoys from sending traffic through the Hatfield's network? The only means for an AS to control the traffic it passes through its AS (known as "transit" traffic--traffic that neither originates in, nor is destined for, the network, but instead is "just passing through") is by controlling the paths that it advertises. For example, if the McCoys are immediate neighbors of the Hatfields, the Hatfields could simply not advertise any routes to the McCoys that contain the Hatfield network. But restricting transit traffic by controlling an AS's route advertisement can only be partially effective. For example, if the Joneses are between the Hatfields and the McCoys, and the Hatfields advertise routes to the Joneses that pass through the Hatfields, then the Hatfields cannot prevent (using BGP mechanisms) the Joneses from advertising these routes to the McCoys. 

Very often an AS will have multiple gateway routers that provide connections to other ASs. Even though BGP is an inter-AS protocol, it can still be used inside an AS as a pipe to exchange BGP updates among gateway routers belonging to the same AS. BGP connections inside an AS are called Internal BGP (IBGP), whereas BGP connections between ASs are called External BGP (EBGP).

As noted above, BGP is becoming the de facto standard for inter-AS routing for the public Internet. BGP is used, for example, at the major network access points (NAPs) where major Internet carriers connect to each other and exchange traffic. To see the contents of today's (less than four hours out of date) BGP routing table (large!) at one of the major NAPs in the U.S. (which include Chicago and San Francisco), see [IPMA 2000]. 

This completes our brief introduction of BGP. Although BGP is complex, it plays a central role in the Internet. We encourage readers to see the references [Stewart 1999; Labovitz 1997; Halabi 1997; Huitema 1995] to learn more about BGP. 
Principles in Practice

Why are there Different Inter-AS and Intra-AS Routing Protocols?

Having now studied the details of specific inter-AS and intra-AS routing protocols deployed in today's Internet, let's conclude by considering perhaps the most fundamental question we could ask about these protocols in the first place (hopefully, you have been wondering this all along, and have not lost the forest for the trees!):

Why are different inter-AS and intra-AS routing protocols used?

The answer to this question gets at the heart of the differences between the goals of routing within an AS and among ASs:

  • Policy. Among ASs, policy issues dominate. It may well be important that traffic originating in a given AS specifically not be able to pass through another specific AS. Similarly, a given AS may well want to control what transit traffic it carries between other ASs. We have seen that BGP specifically carries path attributes and provides for controlled distribution of routing information so that such policy-based routing decisions can be made. Within an AS, everything is nominally under the same administrative control, and thus policy issues play a much less important role in choosing routes within the AS.
  • Scale. The ability of a routing algorithm and its data structures to scale to handle routing to/among large numbers of networks is a critical issue in inter-AS routing. Within an AS, scalability is less of a concern. For one thing, if a single administrative domain becomes too large, it is always possible to divide it into two ASs and perform inter-AS routing between the two new ASs. (Recall that OSPF allows such a hierarchy to be built by splitting an AS into "areas.")
  • Performance. Because inter-AS routing is so policy-oriented, the quality (for example, performance) of the routes used is often of secondary concern (that is, a longer or more costly route that satisfies certain policy criteria may well be taken over a route that is shorter but does not meet that criteria). Indeed, we saw that among ASs, there is not even the notion of preference or costs associated with routes. Within a single AS, however, such policy concerns can be ignored, allowing routing to focus more on the level of performance realized on a route.

© 2000-2001 by Addison Wesley Longman
A division of Pearson Education