Computer Networking: A Top-Down Approach Featuring the Internet Chapter 5 -- 4.6: What's Inside a Router?

4.6: What's Inside a Router?

Our study of the network layer so far has focused on network-layer service models, the routing algorithms that control the routes taken by packets through the network, and the protocols that embody these routing algorithms. These topics, however, are only part (albeit important ones) of what goes on in the network layer. We have yet to consider the switching function of a router--the actual transfer of datagrams from a router's incoming links to the appropriate outgoing links. Studying just the control and service aspects of the network layer is like studying a company and considering only its management (which controls the company but typically performs very little of the actual "grunt" work that makes a company run!) and its public relations ("Our product will provide this wonderful service to you!"). To fully appreciate what really goes on within a company, one needs to consider the workers. In the network layer, the real work (that is, the reason the network layer exists in the first place) is the forwarding of datagrams. A key component in this forwarding process is the transfer of a datagram from a router's incoming link to an outgoing link. In this section, we study how this is accomplished. Our coverage here is necessarily brief, as an entire course would be needed to cover router design in depth. Consequently, we'll make a special effort in this section to provide pointers to material that covers this topic in more depth.

A high-level view of a generic router architecture is shown in Figure 4.34. Four components of a router can be identified:

Figure 4.34: Router architecture

Input ports. The input port performs several functions. It performs the physical layer functionality (the leftmost box of the input port and the rightmost box of the output port in Figure 4.34) of terminating an incoming physical link to a router. It performs the data-link layer functionality (shown in the middle boxes in the input and output ports) needed to interoperate with the data link layer functionality (see Chapter 5) on the other side of the incoming link. It also performs a lookup and forwarding function (the rightmost box of the input port and the leftmost box of the output port) so that a packet forwarded into the switching fabric of the router emerges at the appropriate output port. Control packets (for example, packets carrying routing protocol information for RIP, OSPF or BGP) are forwarded from the input port to the routing processor. In practice, multiple ports are often gathered together on a single line card within a router.
Switching fabric. The switching fabric connects the router's input ports to its output ports. This switching fabric is completely contained within the router--a network inside of a network router!
Output ports. An output port stores the packets that have been forwarded to it through the switching fabric, and then transmits the packets on the outgoing link. The output port thus performs the reverse data link and physical layer functionality as the input port.
Routing processor. The routing processor executes the routing protocols (for example, the protocols we studied in Section 4.5), maintains the routing tables, and performs network management functions (see Chapter 8), within the router. Since we cover these topics elsewhere in this book, we defer discussion of these topics to elsewhere.

In the following, we'll take a look at input ports, the switching fabric, and output ports in more detail. [Turner 1988; Giacopelli 1990; McKeown 1997a; Partridge 1998] provide a discussion of some specific router architectures. [McKeown 1997b] provides a particularly readable overview of modern router architectures, using the Cisco 12000 router as an example.

4.6.1: Input Ports

A more detailed view of input port functionality is given in Figure 4.35. As discussed above, the input port's line termination function and data link processing implement the physical and data link layers associated with an individual input link to the router. The lookup/forwarding function of the input port is central to the switching function of the router. In many routers, it is here that the router determines the output port to which an arriving packet will be forwarded via the switching fabric. The choice of the output port is made using the information contained in the routing table. Although the routing table is computed by the routing processor, a shadow copy of the routing table is typically stored at each input port and updated, as needed, by the routing processor. With local copies of the routing table, the switching decision can be made locally, at each input port, without invoking the centralized routing processor. Such decen tralized switching avoids creating a forwarding bottle neck at a single point within the router.

Figure 4.35: Input port processing

In routers with limited processing capabilities at the input port, the input port may simply forward the packet to the centralized routing processor, which will then perform the routing table lookup and forward the packet to the appropriate output port. This is the approach taken when a workstation or server serves as a router; here, the routing processor is really just the workstation's CPU and the input port is really just a network interface card (for example, an Ethernet card).

Given the existence of a routing table, the routing table lookup is concep tually simple--we just search through the routing table, looking for a destination entry that best matches the destination network address of the packet, or a default route if the destination entry is missing. (Recall from our discussion in Section 4.4.1 that the best match is the routing table entry with the longest network prefix that matches the packet's des tination address.) In practice, however, life is not so simple. Perhaps the most important complicating factor is that backbone routers must operate at high speeds, being capable of performing millions of lookups per second. Indeed, it is desirable for the input port processing to be able to proceed at line speed, that is, that a lookup can be done in less than the amount of time needed to receive a packet at the input port. In this case, input processing of a received packet can be completed before the next receive operation is complete. To get an idea of the performance requirements for a lookup, consider that a so-called OC48 link runs at 2.5 Gbps. With 256-byte-long packets, this implies a lookup speed of approximately a million lookups per second.

Given the need to operate at today's high-link speeds, a linear search through a large routing table is impossible. A more reasonable technique is to store the routing table entries in a tree data structure. Each level in the tree can be thought of as corresponding to a bit in the destination address. To look up an address, one simply starts at the root node of the tree. If the first address bit is a zero, then the left subtree will contain the routing table entry for a destination address; otherwise it will be in the right subtree. The appropriate subtree is then traversed using the remaining address bits--if the next address bit is a zero, the left subtree of the initial subtree is chosen; otherwise, the right subtree of the initial subtree is chosen. In this manner, one can look up the routing table entry in N steps, where N is the number of bits in the address. (The reader will note that this is essentially a binary search through an address space of size 2^N.) Refinements of this approach are discussed in [Doeringer 1996]. An improvement over binary search techniques is described in [Srinivasan 1999].

But even with N = 32 (for example, a 32-bit IP address) steps, the lookup speed via binary search is not fast enough for today's backbone routing requirements. For example, assuming a memory access at each step, less than a million address lookups/sec could be performed with 40 ns memory access times. Several techniques have thus been explored to increase lookup speeds. Content addressable memories (CAMs) allow a 32-bit IP address to be presented to the CAM, which then returns the content of the routing table entry for that address in essentially constant time. The Cisco 8500 series router [Cisco 8500 1999] has a 64K CAM for each input port. Another technique for speeding lookup is to keep recently accessed routing table entries in a cache [Feldmeier 1988]. Here, the potential concern is the size of the cache. Measurements in [Thomson 1997] suggest that even for an OC-3 speed link, approximately 256,000 source-destination pairs might be seen in one minute in a backbone router. Most recently, even faster data structures, which allow routing table entries to be located in log(N) steps [Waldvogel 1997], or which compress routing tables in novel ways [Brodnik 1997], have been proposed. A hardware-based approach to lookup that is optimized for the common case that the address being looked up has 24 or fewer significant bits is discussed in [Gupta 1998].

Once the output port for a packet has been determined via the lookup, the packet can be forwarded into the switching fabric. However, as we'll see below, a packet may be temporarily blocked from entering the switching fabric (due to the fact that packets from other input ports are currently using the fabric). A blocked packet must thus be queued at the input port and then scheduled to cross the switching fabric at a later point in time. We'll take a closer look at the blocking, queuing, and scheduling of packets (at both input ports and output ports) within a router in Section 4.6.4.

Cisco Systems: Dominating the Network Core
As of this writing (January 2000), Cisco Systems employs 23,000 people and has a market capitalization of $360 billion. Cisco currently dominates the Internet router market, and in recent years has quickly moved into the Internet telephony market, where it will compete head-to-head with the telephone equipment companies, such as Lucent, Alcatel, Nothern Telecom, and Siemens. How did this gorilla of a networking company come to be? It all started in 1984 (only 16 years ago) in the living room of a house in Silicon Valley apartment.
Len Bosak and his wife Sandy Lerner were working at Stanford University when they had the idea to build and sell Internet routers to research and academic institutions. Sandy Lerner came up with the name "Cisco" (an abbreviation for San Francisco), and she also designed the company's bridge logo. Corporate headquarters was originally in their living room, and they originally financed the project with credit cards and moonlighting consulting jobs. At the end of 1986, Cisco's revenues reached $250,000 a month--not bad for a business financed on credit cards and without any venture capital. At the end of 1987, Cisco finally succeeded in attracting venture capital--$2 million dollars from Sequoia Capital in exchange for one-third of the company. Over the next few years, Cisco continued to grow and grab more and more market share. At the same time, relations between Bosak/Lerner and Cisco management strained. Cisco went public in 1990, but in the same year Cisco fired Lerner, and Bosak resigned.

4.6.2: Switching Fabrics

The switching fabric is at the very heart of a router. It is through this switching fabric that the packets are actually moved from an input port to an output port. Switching can be accomplished in a number of ways, as indicated in Figure 4.36.

Figure 4.36: Three switching techniques

Switching via memory. The simplest, earliest routers were often traditional computers, with switching between input and output port, being done under direct control of the CPU (routing processor). Input and output ports functioned as traditional I/O devices in a traditional operating system. An input port with an arriving packet first signaled the routing processor via an interrupt. The packet was then copied from the input port into processor memory. The routing processor then extracted the destination address from the header, looked up the appropriate output port in the routing table, and copied the packet to the output port's buffers. Note that if the memory bandwidth is such that B packets/sec can be written into, or read from, memory, then the overall switch throughput (the total rate at which packets are transferred from input ports to output ports) must be less than B/2.

Many modern routers also switch via memory. A major difference from early routers, however, is that the lookup of the destination address and the storing (switching) of the packet into the appropriate memory location is performed by processors on the input line cards. In some ways, routers that switch via memory look very much like shared memory multiprocessors, with the processors on a line card storing packets into the memory of the appropriate output port. Cisco's Catalyst 8500 series switches [Cisco 8500 1999] and Bay Networks Accelar 1200 Series routers switch packets via a shared memory.

Switching via a bus. In this approach, the input ports transfer a packet directly to the output port over a shared bus, without intervention by the routing processor (Note that when switching via memory, the packet must also cross the system bus going to/from memory). Although the routing processor is not involved in the bus transfer, since the bus is shared, only one packet at a time can be transferred over the bus. A packet arriving at an input port and finding the bus busy with the transfer of another packet is blocked from passing through the switching fabric and is queued at the input port. Because every packet must cross the single bus, the switching bandwidth of the router is limited to the bus speed.

Given that bus bandwidths of over a gigabit per second are possible in today's technology, switching via a bus is often sufficient for routers that operate in access and enterprise networks (for example, local area and corporate networks). Bus-based switching has been adopted in a number of current router products, including the Cisco 1900 [Cisco Switches 1999], which switches packets over a 1 Gbps Packet Exchange Bus. 3Com's CoreBuilder 5000 systems [Kapoor 1997] interconnects ports that reside on different switch modules over its PacketChannel data bus, with a bandwidth of 2 Gbps.

Switching via an interconnection network. One way to overcome the bandwidth limitation of a single, shared bus is to use a more sophisticated interconnection network, such as those that have been used in the past to interconnect processors in a multiprocessor computer architecture. A crossbar switch is an interconnection network consisting of 2N busses that connect N input ports to N output ports, as shown in Figure 4.36. A packet arriving at an input port travels along the horizontal bus attached to the input port until it intersects with the vertical bus leading to the desired output port. If the vertical bus leading to the output port is free, the packet is transferred to the output port. If the vertical bus is being used to transfer a packet from another input port to this same output port, the arriving packet is blocked and must be queued at the input port.

Delta and Omega switching fabrics have also been proposed as an interconnection network between input and output ports. See [Tobagi 1990] for a survey of switch architectures. Cisco 12000 Family switches [Cisco 12000 1998] use an interconnection network, providing up to 60 Gbps through the switching fabric. One current trend in interconnection network design [Keshav 1998] is to fragment a variable length IP datagram into fixed-length cells, and then tag and switch the fixed-length cells through the interconnection network. The cells are then reassembled into the original datagram at the output port. The fixed-length cell and internal tag can considerably simplify and speed up the switching of the packet through the interconnection network.

4.6.3: Output Ports

Output port processing, shown in Figure 4.37, takes the datagrams that have been stored in the output port's memory and transmits them over the outgoing link. The data-link protocol processing and line termination are the send-side link- and physical-layer functionality that interact with the input port on the other end of the outgoing link, as discussed above in Section 4.6.1. The queuing and buffer management functionality are needed when the switch fabric delivers packets to the output port at a rate that exceeds the output link rate; we'll cover output port queuing below.

Figure 4.37: Output port processing

4.6.4: Where Does Queuing Occur?

If we look at the input and output port functionality and the configurations shown in Figure 4.36, it is evident that packet queues can form at both the input ports and the output ports. It is important to consider these queues in a bit more detail, since as these queues grow large, the router's buffer space will eventually be exhausted and packet loss will occur. Recall that in our earlier discussions, we said rather vaguely that packets were lost "within the network" or "dropped at a router." It is here, at these queues within a router, where such packets are dropped and lost. The actual location of packet loss (either at the input port queues or the output port queues) will depend on the traffic load, the relative speed of the switching fabric and the line speed, as discussed below.

Suppose that the input line speeds and output line speeds are all identical, and that there are n input ports and n output ports. If the switching fabric speed is at least n times as fast as the input line speed, then no queuing can occur at the input ports. This is because even in the worst case that all n input lines are receiving packets, the switch will be able to transfer n packets from input port to output port in the time it takes each of the n input ports to (simultaneously) receive a single packet. But what can happen at the output ports? Let us suppose still that the switching fabric is at least n times as fast as the line speeds. In the worst case, the packets arriving at each of the n input ports will be destined to the same output port. In this case, in the time it takes to receive (or send) a single packet, n packets will arrive at this output port. Since the output port can only transmit a single packet in a unit of time (the packet transmission time), the n arriving packets will have to queue (wait) for transmission over the outgoing link. n more packets can then possibly arrive in the time it takes to transmit just one of the n packets that had previously been queued. And so on. Eventually, the number of queued packets can grow large enough to exhaust the memory space at the output port, in which case packets are dropped.

Output port queuing is illustrated in Figure 4.38. At time t, a packet has arrived at each of the incoming input ports, each destined for the uppermost outgoing port. Assuming identical line speeds and a switch operating at three times the line speed, one time unit later (that is, in the time needed to receive or send a packet), all three original packets have been transferred to the outgoing port and are queued awaiting transmission. In the next time unit, one of these three packets will have been transmitted over the outgoing link. In our example, two new packets have arrived at the incoming side of the switch; one of these packets is destined for this uppermost output port.

Figure 4.38: Ourput port queuing

A consequence of output port queuing is that a packet scheduler at the output port must choose one packet among those queued for transmission. This selection might be done on a simple basis such as first-come-first-served (FCFS) scheduling, or a more sophisticated scheduling discipline such as weighted fair queuing (WFQ), which shares the outgoing link "fairly" among the different end-to-end connections that have packets queued for transmission. Packet scheduling plays a crucial role in providing quality-of-service guarantees. We will cover this topic extensively in Section 6.6. A discussion of output port packet scheduling disciplines used in today's routers is [Cisco Queue 1995].

If the switch fabric is not fast enough (relative to the input line speeds) to transfer all arriving packets through the fabric without delay, then packet queuing will also occur at the input ports, as packets must join input port queues to wait their turn to be transferred through the switching fabric to the output port. To illustrate an important consequence of this queuing, consider a crossbar switching fabric and suppose that (1) all link speeds are identical, (2) that one packet can be transferred from any one input port to a given output port in the same amount of time it takes for a packet to be received on an input link, and (3) packets are moved from a given input queue to their desired output queue in a FCFS manner. Multiple packets can be transferred in parallel, as long as their output ports are different. However, if two packets at the front of two input queues are destined to the same output queue, then one of the packets will be blocked and must wait at the input queue--the switching fabric can only transfer one packet to a given output port at a time.

Figure 4.39 shows an example where two packets (shaded black) at the front of their input queues are destined for the same upper-right output port. Suppose that the switch fabric chooses to transfer the packet from the front of the upper-left queue. In this case, the black packet in the lower-left queue must wait. But not only must this black packet wait, but so too must the white packet that is queued behind that packet in the lower-left queue, even though there is no contention for the middle-right output port (the destination for the white packet). This phenomenon is known as head-of-the-line (HOL) blocking in an input-queued switch--a queued packet in an input queue must wait for transfer through the fabric (even though its output port is free) because it is blocked by another packet at the head of the line. [Karol 1987] shows that due to HOL blocking, the input queue will grow to unbounded length (informally, this is equivalent to saying that significant packet loss will occur) under certain assumptions as soon as the packet arrival rate on the input links reaches only 58 percent of their capacity. A number of solutions to HOL blocking are discussed in [McKeown 1997b].

Figure 4.39: HOL blocking at an input queued switch