Computer Networking: A Top-Down Approach Featuring the Internet Chapter 3 -- 2.2: The World Wide Web: HTTP

2.2: The World Wide Web: HTTP

Until the 1990s the Internet was used primarily by researchers, academics, and university students to log-in to remote hosts, to transfer files from local hosts to remote hosts and vice versa, to receive and send news, and to receive and send electronic mail. Although these applications were (and continue to be) extremely useful, the Internet was essentially unknown outside the academic and research communities. Then, in the early 1990s, the Internet's killer application arrived on the scene--the World Wide Web [Berners-Lee 1994]. The Web is the Internet application that caught the general public's eye. It is dramatically changing how people interact inside and outside their work environments. It has spawned thousands of start up companies. It has elevated the Internet from just one of many data networks (including online networks such as Prodigy, America OnLine, and Compuserve, national data networks such as Minitel/Transpac in France, private X.25, and frame relay networks) to essentially the one and only data network.

History is sprinkled with the arrival of electronic communication technologies that have had major societal impacts. The first such technology was the telephone, invented in the 1870s. The telephone allowed two persons to orally communicate in real-time without being in the same physical location. It had a major impact on society--both good and bad. The next electronic communication technology was broadcast radio/television, which arrived in the 1920s and 1930s. Broadcast radio/ television allowed people to receive vast quantities of audio and video information. It also had a major impact on society--both good and bad. The third major communication technology that has changed the way people live and work is the Web. Perhaps what appeals the most to users about the Web is that it operates on demand. Users receive what they want, when they want it. This is unlike broadcast radio and television, which force users to "tune in" when the content provider makes the content available. In addition to being on demand, the Web has many other wonderful features that people love and cherish. It is enormously easy for any individual to make any information available over the Web; everyone can become a publisher at extremely low cost. Hyperlinks and search engines help us navigate through an ocean of Web sites. Graphics and animated graphics stimulate our senses. Forms, Java applets, Active X components, as well as many other devices enable us to interact with pages and sites. And more and more, the Web provides a menu interface to vast quantities of audio and video material stored in the Internet, audio and video that can be accessed on demand.

The Browser War
In April 1994, Marc Andreesen, a computer scientist who had earlier led the development of the Mosaic browser at the University of Illinois Urbana-Champagne, and Jim Clark, founder of Silicon Graphics and a former Stanford professor, founded Netscape Communication Corporation. Netscape recruited much of the original Mosaic team from Illinois and released its Beta version of Navigator 1.0 in October of 1994. In the years to come, Netscape would make significant enhancements to its browsers, develop Web servers, commerce servers, mail servers, news servers, proxy servers, mail readers, and many other application-layer software products. They were certainly one of the most innovative and successful Internet companies in the mid 1990s. In January 1995, Jim Barksdale became the CEO of Netscape, and in August of 1995 Netscape went public with much fanfare.
Microsoft, originally slow to get into the Internet business, introduced its own browser, Internet Explorer 1.0, in August 1995. Internet Explorer 1.0 was clunky and slow, but Microsoft invested heavily in its development, and by late 1997 Microsoft and Netscape were running head-to-head in the browser war. On June 11, 1997, Netscape released the version 4.0 of its browser, and on September 30, Microsoft released its version 4.0. There was no consensus at that time as to which of the two browsers was better, and Microsoft, with its monopoly on the Windows operating system, continued to gain more market share. In 1997, Netscape made several major mistakes, such as failing to recognize the importance of its Web site as a potential portal site, and launching a major development effort for an all-Java browser (when Java was not quite ripe for the task) [Cusumano 1998]. Throughout 1998, Netscape continued to lose market share for its browser and other products, and in late 1998 it was acquired by America Online. Marc Andreesen and most of the original Netscape people have since left.

2.2.1: Overview of HTTP

The Hypertext Transfer Protocol (HTTP), the Web's application-layer protocol, is at the heart of the Web. HTTP is implemented in two programs: a client program and a server program. The client program and server program, executing on different end systems, talk to each other by exchanging HTTP messages. HTTP defines the structure of these messages and how the client and server exchange the messages. Before explaining HTTP in detail, it is useful to review some Web terminology.

A Web page (also called a document) consists of objects. An object is simply a file--such as an HTML file, a JPEG image, a GIF image, a Java applet, an audio clip, and so on--that is addressable by a single URL. Most Web pages consist of a base HTML file and several referenced objects. For example, if a Web page contains HTML text and five JPEG images, then the Web page has six objects: the base HTML file plus the five images. The base HTML file references the other objects in the page with the objects' URLs. Each URL has two components: the host name of the server that houses the object and the object's path name. For example, the URL

www.someSchool.edu/someDepartment/picture.gif

has www.someSchool.edu for a host name and /someDepartment/picture.gif for a path name. A browser is a user agent for the Web; it displays the requested Web page and provides numerous navigational and configuration features. Web browsers also implement the client side of HTTP. Thus, in the context of the Web, we will interchangeably use the words "browser" and "client." Popular Web browsers include Netscape Communicator and Microsoft Internet Explorer. A Web server houses Web objects, each addressable by a URL. Web servers also implement the server side of HTTP. Popular Web servers include Apache, Microsoft Internet Information Server, and the Netscape Enterprise Server. (Netcraft provides a nice survey of Web server penetration [Netcraft 2000].)

HTTP defines how Web clients (that is, browsers) request Web pages from servers (that is, Web servers) and how servers transfer Web pages to clients. We discuss the interaction between client and server in detail below, but the general idea is illustrated in Figure 2.6. When a user requests a Web page (for example, clicks on a hyperlink), the browser sends HTTP request messages for the objects in the page to the server. The server receives the requests and responds with HTTP response messages that contain the objects. Through 1997 essentially all browsers and Web servers implemented version HTTP/1.0, which is defined in RFC 1945. Beginning in 1998, some Web servers and browsers began to implement version HTTP/1.1, which is defined in RFC 2616. HTTP/1.1 is backward compatible with HTTP/1.0; a Web server running 1.1 can "talk" with a browser running 1.0, and a browser running 1.1 can "talk" with a server running 1.0.

Figure 2.6: HTTP request-response behavior

Both HTTP/1.0 and HTTP/1.1 use TCP as their underlying transport protocol (rather than running on top of UDP). The HTTP client first initiates a TCP connection with the server. Once the connection is established, the browser and the server processes access TCP through their socket interfaces. As described in Section 2.1, on the client side the socket interface is the "door" between the client process and the TCP connection; on the server side it is the "door" between the server process and the TCP connection. The client sends HTTP request messages into its socket interface and receives HTTP response messages from its socket interface. Similarly, the HTTP server receives request messages from its socket interface and sends response messages into the socket interface. Once the client sends a message into its socket interface, the message is "out of the client's hands" and is "in the hands of TCP." Recall from Section 2.1 that TCP provides a reliable data transfer service to HTTP. This implies that each HTTP request message emitted by a client process eventually arrives intact at the server; similarly, each HTTP response message emitted by the server process eventually arrives intact at the client. Here we see one of the great advantages of a layered architecture--HTTP need not worry about lost data, or the details of how TCP recovers from loss or reordering of data within the network. That is the job of TCP and the protocols in the lower layers of the protocol stack.

TCP also employs a congestion control mechanism that we shall discuss in detail in Chapter 3. We mention here only that this mechanism forces each new TCP connection to initially transmit data at a relatively slow rate, but then allows each connection to ramp up to a relatively high rate when the network is uncongested. The initial slow-transmission phase is referred to as slow start.

It is important to note that the server sends requested files to clients without storing any state information about the client. If a particular client asks for the same object twice in a period of a few seconds, the server does not respond by saying that it just served the object to the client; instead, the server resends the object, as it has completely forgotten what it did earlier. Because an HTTP server maintains no information about the clients, HTTP is said to be a stateless protocol.

2.2.2: Nonpersistent and Persistent Connections

HTTP can use both nonpersistent connections and persistent connections. HTTP/ 1.0 uses nonpersistent connections. Conversely, use of persistent connections is the default mode for HTTP/1.1.

Nonpersistent Connections

Let us walk through the steps of transferring a Web page from server to client for the case of nonpersistent connections. Suppose the page consists of a base HTML file and 10 JPEG images, and that all 11 of these objects reside on the same server. Suppose the URL for the base HTML file is

www.someSchool.edu/someDepartment/home.index.

Here is what happens:

The HTTP client initiates a TCP connection to the server www.someSchool. edu. Port number 80 is used as the default port number at which the HTTP server will be listening for HTTP clients that want to retrieve documents using HTTP.
The HTTP client sends a HTTP request message to the server via the socket associated with the TCP connection that was established in step 1. The request message includes the path name /someDepartment/home.index. (We will discuss the HTTP messages in some detail below.)
The HTTP server receives the request message via the socket associated with the connection that was established in step 1, retrieves the object /someDepartment/home.index from its storage (RAM or disk), encapsulates the object in an HTTP response message, and sends the response message to the client via the socket.
The HTTP server tells TCP to close the TCP connection. (But TCP doesn't actually terminate the connection until the client has received the response message intact.)
The HTTP client receives the response message. The TCP connection terminates. The message indicates that the encapsulated object is an HTML file. The client extracts the file from the response message, parses the HTML file, and finds references to the 10 JPEG objects.
The first four steps are then repeated for each of the referenced JPEG objects.

As the browser receives the Web page, it displays the page to the user. Two different browsers may interpret (that is, display to the user) a Web page in somewhat different ways. HTTP has nothing to do with how a Web page is interpreted by a client. The HTTP specifications ([RFC 1945] and [RFC 2616]) only define the communication protocol between the client HTTP program and the server HTTP program.

The steps above use nonpersistent connections because each TCP connection is closed after the server sends the object--the connection does not persist for other objects. Note that each TCP connection transports exactly one request message and one response message. Thus, in this example, when a user requests the Web page, 11 TCP connections are generated.

In the steps described above, we were intentionally vague about whether the client obtains the 10 JPEGs over 10 serial TCP connections, or whether some of the JPEGs are obtained over parallel TCP connections. Indeed, users can configure modern browsers to control the degree of parallelism. In their default modes, most browsers open 5 to 10 parallel TCP connections, and each of these connections handles one request-response transaction. If the user prefers, the maximum number of parallel connections can be set to 1, in which case the 10 connections are established serially. As we shall see in the next chapter, the use of parallel connections shortens the response time.

Before continuing, let's do a back-of-the-envelope calculation to estimate the amount of time from a client requesting the base HTML file until the file is received by the client. To this end we define the round-trip time (RTT), which is the time it takes for a small packet to travel from client to server and then back to the client. The RTT includes packet-propagation delays, packet-queuing delays in intermediate routers and switches, and packet-processing delays. (These delays were discussed in Section 1.6.) Now consider what happens when a user clicks on a hyperlink. This causes the browser to initiate a TCP connection between the browser and the Web server; this involves a "three-way handshake"--the client sends a small TCP message to the server, the server acknowledges and responds with a small message, and, finally, the client acknowledges back to the server. One RTT elapses after the first two parts of the three-way handshake. After completing the first two parts of the handshake, the client sends the HTTP request message into the TCP connection, and TCP "piggybacks" the last acknowledgment (the third part of the three-way handshake) onto the request message. Once the request message arrives at the server, the server sends the HTML file into the TCP connection. This HTTP request/response eats up another RTT. Thus, roughly, the total response time is two RTTs plus the transmission time at the server of the HTML file.

Persistent Connections

Nonpersistent connections have some shortcomings. First, a brand new connection must be established and maintained for each requested object. For each of these connections, TCP buffers must be allocated and TCP variables must be kept in both the client and server. This can place a serious burden on the Web server, which may be serving requests from hundreds of different clients simultaneously. Second, as we just described, each object suffers two RTTs--one RTT to establish the TCP connection and one RTT to request and receive an object. Finally, each object suffers from TCP slow start because every TCP connection begins with a TCP slow-start phase. The impact of RTT and slow-start delays can be partially mitigated, however, by the use of parallel TCP connections.

With persistent connections, the server leaves the TCP connection open after sending a response. Subsequent requests and responses between the same client and server can be sent over the same connection. In particular, an entire Web page (in the example above, the base HTML file and the 10 images) can be sent over a single persistent TCP connection; moreover, multiple Web pages residing on the same server can be sent over a single persistent TCP connection. Typically, the HTTP server closes a connection when it isn't used for a certain time (the timeout interval), which is often configurable. There are two versions of persistent connections: without pipelining and with pipelining. For the version without pipelining, the client issues a new request only when the previous response has been received. In this case, each of the referenced objects (the 10 images in the example above) experiences one RTT in order to request and receive the object. Although this is an improvement over nonpersistent's two RTTs, the RTT delay can be further reduced with pipelining. Another disadvantage of no pipelining is that after the server sends an object over the persistent TCP connection, the connection hangs--does nothing--while it waits for another request to arrive. This hanging wastes server resources.

The default mode of HTTP/1.1 uses persistent connections with pipelining. In this case, the HTTP client issues a request as soon as it encounters a reference. Thus the HTTP client can make back-to-back requests for the referenced objects. When the server receives the requests, it can send the objects back to back. If all the requests are sent back to back and all the responses are sent back to back, then only one RTT is expended for all the referenced objects (rather than one RTT per referenced object when pipelining isn't used). Furthermore, the pipelined TCP connection hangs for a smaller fraction of time. In addition to reducing RTT delays, persistent connections (with or without pipelining) have a smaller slow-start delay than nonpersistent connections. The reason is that after sending the first object, the persistent server does not have to send the next object at the initial slow rate since it continues to use the same TCP connection. Instead, the server can pick up at the rate where the first object left off. We shall quantitatively compare the performance of nonpersistent and persistent connections in the homework problems of Chapter 3. The interested reader is also encouraged to see [Heidemann 1997; Nielsen 1997].

2.2.3: HTTP Message Format

The HTTP specifications 1.0 ([RFC 1945] and 1.1 [RFC 2616]) define the HTTP message formats. There are two types of HTTP messages, request messages and response messages, both of which are discussed below.

HTTP Request Message

Below we provide a typical HTTP request message:

GET /somedir/page.html HTTP/1.1

Host: www.someschool.edu

Connection: close

User-agent: Mozilla/4.0

Accept-language:fr

(extra carriage return, line feed)

We can learn a lot by taking a good look at this simple request message. First of all, we see that the message is written in ordinary ASCII text, so that your ordinary computer-literate human being can read it. Second, we see that the message consists of five lines, each followed by a carriage return and a line feed. The last line is followed by an additional carriage return and line feed. Although this particular request message has five lines, a request message can have many more lines or as few as one line. The first line of an HTTP request message is called the request line; the subsequent lines are called the header lines. The request line has three fields: the method field, the URL field, and the HTTP version field. The method field can take on several different values, including GET, POST, and HEAD. The great majority of HTTP request messages use the GET method. The GET method is used when the browser requests an object, with the requested object identified in the URL field. In this example, the browser is requesting the object /somedir/page.html. The version is self-explanatory; in this example, the browser implements version HTTP/1.1.

Now let's look at the header lines in the example. The header line Host: www. someschool.edu specifies the host on which the object resides. By including the Connection: close header line, the browser is telling the server that it doesn't want to use persistent connections; it wants the server to close the connection after sending the requested object. Although the browser that generated this request message implements HTTP/1.1, it doesn't want to bother with persistent connections. The User-agent: header line specifies the user agent, that is, the browser type that is making the request to the server. Here the user agent is Mozilla/4.0, a Netscape browser. This header line is useful because the server can actually send different versions of the same object to different types of user agents. (Each of the versions is addressed by the same URL.) Finally, the Accept-language: header indicates that the user prefers to receive a French version of the object, if such an object exists on the server; otherwise, the server should send its default version. The Accept-language: header is just one of many content negotiation headers available in HTTP.

Having looked at an example, let us now look at the general format for a request message, as shown in Figure 2.7.

Figure 2.7: General format of a request message

We see that the general format of a request message closely follows our earlier example. You may have noticed, however, that after the header lines (and the additional carriage return and line feed) there is an "entity body." The entity body is not used with the GET method, but is used with the POST method. The HTTP client uses the POST method when the user fills out a form--for example, when a user gives search words to a search engine such as Altavista. With a POST message, the user is still requesting a Web page from the server, but the specific contents of the Web page depend on what the user entered into the form fields. If the value of the method field is POST, then the entity body contains what the user entered into the form fields. The HEAD method is similar to the GET method. When a server receives a request with the HEAD method, it responds with an HTTP message but it leaves out the requested object. The HEAD method is often used by HTTP server developers for debugging.

HTTP Response Message

Below we provide a typical HTTP response message. This response message could be the response to the example request message just discussed.

HTTP/1.1 200 OK

Connection: close

Date: Thu, 06 Aug 1998 12:00:15 GMT

Server: Apache/1.3.0 (Unix)

Last-Modified: Mon, 22 Jun 1998 09:23:24 GMT

Content-Length: 6821

Content-Type: text/html



(data data data data data . . .)

Let's take a careful look at this response message. It has three sections: an initial status line, six header lines, and then the entity body. The entity body is the meat of the message--it contains the requested object itself (represented by data data data data data . . .). The status line has three fields: the protocol version field, a status code, and a corresponding status message. In this example, the status line indicates that the server is using HTTP/1.1 and that everything is OK (that is, the server has found, and is sending, the requested object).

Now let's look at the header lines. The server uses the Connection: closeheader line to tell the client that it is going to close the TCP connection after sending the message. The Date: header line indicates the time and date when the HTTP response was created and sent by the server. Note that this is not the time when the object was created or last modified; it is the time when the server retrieves the object from its file system, inserts the object into the response message, and sends the response message. The Server: header line indicates that the message was generated by an Apache Web server; it is analogous to the User-agent: header line in the HTTP request message. The Last-Modified: header line indicates the time and date when the object was created or last modified. The Last-Modified: header, which we will cover in more detail, is critical for object caching, both in the local client and in network cache servers (also known as proxy servers). The Content-Length: header line indicates the number of bytes in the object being sent. The Content-Type: header line indicates that the object in the entity body is HTML text. (The object type is officially indicated by the Content-Type: header and not by the file extension.)

Note that if the server receives an HTTP/1.0 request, it will not use persistent connections, even if it is an HTTP/1.1 server. Instead, the HTTP/1.1 server will close the TCP connection after sending the object. This is necessary because an HTTP/1.0 client expects the server to close the connection.

Having looked at an example, let us now examine the general format of a response message, which is shown in Figure 2.8. This general format of the response message matches the previous example of a response message.

Figure 2.8: General format of a response message

Let's say a few additional words about status codes and their phrases. The status code and associated phrase indicate the result of the request. Some common status codes and associated phrases include:

200 OK: Request succeeded and the information is returned in the response.
301 Moved Permanently: Requested object has been permanently moved; new URL is specified in Location: header of the response message. The client software will automatically retrieve the new URL.
400 Bad Request: A generic error code indicating that the request could not be understood by the server.
404 Not Found: The requested document does not exist on this server.
505 HTTP Version Not Supported:The requested HTTP protocol version is not supported by the server.

How would you like to see a real HTTP response message? This is very easy to do! First Telnet into your favorite Web server. Then type in a one-line request message for some object that is housed on the server. For example, if you can logon to a Unix machine, type:

telnet www.eurecom.fr 80

GET /~ross/index.html HTTP/1.0

(Hit the carriage return twice after typing the second line.) This opens a TCP connection to port 80 of the host www.eurecom.fr and then sends the HTTP GET command. You should see a response message that includes the base HTML file of Professor Ross's homepage. If you'd rather just see the HTTP message lines and not receive the object itself, replace GET with HEAD. Finally, replace /~ross/ index.html with /~ross/banana.html and see what kind of response message you get.

In this section we discussed a number of header lines that can be used within HTTP request and response messages. The HTTP specification (especially HTTP/ 1.1) defines many, many more header lines that can be inserted by browsers, Web servers, and network cache servers. We have only covered a small number of the totality of header lines. We will cover a few more below and another small number when we discuss network Web caching at the end of this chapter. A readable and comprehensive discussion of HTTP headers and status codes is given in [Luotonen 1998]. An excellent introduction to the technical issues surrounding the Web is [Yeager 1996].

How does a browser decide which header lines to include in a request message? How does a Web server decide which header lines to include in a response message? A browser will generate header lines as a function of the browser type and version (for example, an HTTP/1.0 browser will not generate any 1.1 header lines), the user configuration of the browser (for example, preferred language), and whether the browser currently has a cached, but possibly out-of-date, version of the object. Web servers behave similarly: There are different products, versions, and configurations, all of which influence which header lines are included in response messages.

2.2.4: User-Server Interaction: Authentication and Cookies

We mentioned above that an HTTP server is stateless. This simplifies server design, and has permitted engineers to develop very high-performing Web servers. However, it is often desirable for a Web site to identify users, either because the server wishes to restrict user access or because it wants to serve content as a function of the user identity. HTTP provides two mechanisms to help a server identify a user: authentication and cookies.

Authentication

Many sites require users to provide a username and a password in order to access the documents housed on the server. This requirement is referred to as authentication. HTTP provides special status codes and headers to help sites perform authentication. Let's walk through an example to get a feel for how these special status codes and headers work. Suppose a client requests an object from a server, and the server requires user authorization.

The client first sends an ordinary request message with no special header lines. The server then responds with empty entity body and with a 401 Authorization Required status code. In this response message the server includes the WWW-Authenticate: header, which specifies the details about how to perform authentication. (Typically, it indicates that the user needs to provide a username and a password.)

The client receives the response message and prompts the user for a username and password. The client resends the request message, but this time includes an Authorization: header line, which includes the username and password.

After obtaining the first object, the client continues to send the username and password in subsequent requests for objects on the server. (This typically continues until the client closes the browser. However, while the browser remains open, the username and password are cached, so the user is not prompted for a username and password for each object it requests!) In this manner, the site can identify the user for every request.

We will see in Chapter 7 that HTTP performs a rather weak form of authentication, one that would not be difficult to break. We will study more secure and robust authentication schemes later in Chapter 7.

Cookies

Cookies are an alternative mechanism that sites can use to keep track of users. They are defined in RFC 2109. Some Web sites use cookies and others don't. Let's walk through an example. Suppose a client contacts a Web site for the first time, and this site uses cookies. The server's response will include a Set-cookie: header. Often this header line contains an identification number generated by the Web server. For example, the header line might be:

Set-cookie: 1678453

When the HTTP client receives the response message, it sees the Set-cookie: header and identification number. It then appends a line to a special cookie file that is stored in the client machine. This line typically includes the host name of the server and user's associated identification number. In subsequent requests to the same server, say one week later, the client includes a Cookie: request header, and this header line specifies the identification number for that server. In the current example, the request message includes the header line:

Cookie: 1678453

In this manner, the server does not know the username of the user, but the server does know that this user is the same user that made a specific request one week ago.

Web servers use cookies for many different purposes:

If a server requires authentication but doesn't want to hassle a user with a username and password prompt every time the user visits the site, it can set a cookie.
If a server wants to remember a user's preferences so that it can provide targeted advertising during subsequent visits, it can set a cookie.
If a user is shopping at a site (for example, buying several CDs), the server can use cookies to keep track of the items that the user is purchasing, that is, to create a virtual shopping cart.

We mention, however, that cookies pose problems for a nomadic user who accesses the same site from different machines. The site will treat the user as a different user for each different machine used. We conclude by pointing the reader to the page Persistent Client State HTTP Cookies [Netscape Cookie 1999], which provides an in-depth but readable introduction to cookies. We also recommend Cookies Central [Cookie Central 2000], which includes extensive information on the cookie controversy.

2.2.5: The Conditional GET

By storing previously retrieved objects, Web caching can reduce object-retrieval delays and diminish the amount of Web traffic sent over the Internet. Web caches can reside in a client or in an intermediate network cache server. We will discuss network caching at the end of this chapter. In this subsection, we restrict our attention to client caching.

Although Web caching can reduce user-perceived response times, it introduces a new problem--a copy of an object residing in the cache may be stale. In other words, the object housed in the Web server may have been modified since the copy was cached at the client. Fortunately, HTTP has a mechanism that allows the client to employ caching while still ensuring that all objects passed to the browser are up to date. This mechanism is called the conditional GET. An HTTP request message is a so-called conditional GET message if (1) the request message uses the GET method and (2) the request message includes an If-Modified-Since: header line.

To illustrate how the conditional get operates, let's walk through an example. First, a browser requests an uncached object from some Web server:

GET /fruit/kiwi.gif HTTP/1.0

User-agent: Mozilla/4.0

Second, the Web server sends a response message with the object to the client:

HTTP/1.0 200 OK

Date: Wed, 12 Aug 1998 15:39:29

Server: Apache/1.3.0 (Unix)

Last-Modified: Mon, 22 Jun 1998 09:23:24

Content-Type: image/gif



(data data data data data ...)

The client displays the object to the user but also saves the object in its local cache. Importantly, the client also caches the last-modified date along with the object. Third, one week later, the user requests the same object and the object is still in the cache. Since this object may have been modified at the Web server in the past week, the browser performs an up-to-date check by issuing a conditional get. Specifically, the browser sends

GET /fruit/kiwi.gif HTTP/1.0

User-agent: Mozilla/4.0

If-modified-since: Mon, 22 Jun 1998 09:23:24

Note that the value of the If-modified-since: header line is exactly equal to the value of the Last-Modified: header line that was sent by the server one week ago. This conditional get is telling the server to send the object only if the object has been modified since the specified date. Suppose the object has not been modified since 22 Jun 1998 09:23:24. Then, fourth, the Web server sends a response message to the client:

HTTP/1.0 304 Not Modified

Date: Wed, 19 Aug 1998 15:39:29

Server: Apache/1.3.0 (Unix)

(empty entity body)

We see that in response to the conditional get, the Web server still sends a response message, but does not include the requested object in the response message. Including the requested object would only waste bandwidth and increase user-perceived response time, particularly if the object is large. Note that this last response message has in the status line 304 Not Modified, which tells the client that it can go ahead and use its cached copy of the object.

2.2.6: Web Caches

A Web cache--also called a proxy server--is a network entity that satisfies HTTP requests on the behalf of a client. The Web cache has its own disk storage and keeps in this storage copies of recently requested objects. As shown in Figure 2.9, users configure their browsers so that all of their HTTP requests are first directed to the Web cache. (This is a straightforward procedure with Microsoft and Netscape browsers.) Once a browser is configured, each browser request for an object is first directed to the Web cache.

Figure 2.9: Clients requesting objects through a Web cache

As an example, suppose a browser is requesting the object http://www.someschool.edu/campus.gif.

The browser establishes a TCP connection to the Web cache and sends an HTTP request for the object to the Web cache.
The Web cache checks to see if it has a copy of the object stored locally. If it does, the Web cache forwards the object within an HTTP response message to the client browser.
If the Web cache does not have the object, the Web cache opens a TCP connection to the origin server, that is, to www.someschool.edu. The Web cache then sends an HTTP request for the object into the TCP connection. After receiving this request, the origin server sends the object within an HTTP response to the Web cache.
When the Web cache receives the object, it stores a copy in its local storage and forwards a copy, within an HTTP response message, to the client browser (over the existing TCP connection between the client browser and the Web cache).

Note that a cache is both a server and a client at the same time. When it receives requests from and sends responses to a browser, it is a server. When it sends requests to and receives responses from an origin server it is a client.

So why bother with a Web cache? What advantages does it have? Web caches are enjoying wide-scale deployment in the Internet for at least three reasons. First, a Web cache can substantially reduce the response time for a client request, particularly if the bottleneck bandwidth between the client and the origin server is much less than the bottleneck bandwidth between the client and the cache. If there is a high-speed connection between the client and the cache, as there often is, and if the cache has the requested object, then the cache will be able to rapidly deliver the object to the client. Second, as we will soon illustrate with an example, Web caches can substantially reduce traffic on an institution's access link to the Internet. By reducing traffic, the institution (for example, a company or a university) does not have to upgrade bandwidth as quickly, thereby reducing costs. Furthermore, Web caches can substantially reduce Web traffic in the Internet as a whole, thereby improving performance for all applications. In 1998, over 75 percent of Internet traffic was Web traffic, so a significant reduction in Web traffic can translate into a significant improvement in Internet performance [Claffy 1998]. Third, an Internet dense with Web caches--such as, at institutional, regional, and national levels--provides an infrastructure for rapid distribution of content, even for content providers who run their sites on low-speed servers behind low-speed access links. If such a "resource-poor" content provider suddenly has popular content to distribute, this popular content will quickly be copied into the Internet caches, and high user demand will be satisfied.

To gain a deeper understanding of the benefits of caches, let us consider an example in the context of Figure 2.10. This figure shows two networks--the institutional network and the rest of the public Internet. The institutional network is a high-speed LAN. A router in the institutional network and a router in the Internet are connected by a 1.5 Mbps link. The origin servers are attached to the Internet, but located all over the globe. Suppose that the average object size is 100 Kbits and that the average request rate from the institution's browsers to the origin servers is 15 requests per second. Also suppose that the amount of time it takes from when the router on the Internet side of the access link in Figure 2.10 forwards an HTTP request (within an IP datagram) until it receives the IP datagram (typically, many IP datagrams) containing the corresponding response is two seconds on average. Informally, we refer to this last delay as the "Internet delay."

Figure 2.10: Bottleneck between an institutional network and the Internet

The total response time--that is, the time from the browser's request of an object until its receipt of the object--is the sum of the LAN delay, the access delay (that is, the delay between the two routers), and the Internet delay. Let us now do a very crude calculation to estimate this delay. The traffic intensity on the LAN (see Section 1.6) is

(15 requests/sec) * (100 Kbits/request)/(10Mbps) = 0.15

whereas the traffic intensity on the access link (from the Internet router to institution router) is

(15 requests/sec) * (100 Kbits/request)/(1.5 Mbps) = 1

A traffic intensity of 0.15 on a LAN typically results in, at most, tens of milliseconds of delay; hence, we can neglect the LAN delay. However, as discussed in Section 1.6, as the traffic intensity approaches 1 (as is the case of the access link in Figure 2.10), the delay on a link becomes very large and grows without bound. Thus, the average response time to satisfy requests is going to be on the order of minutes, if not more, which is unacceptable for the institution's users. Clearly something must be done.

One possible solution is to increase the access rate from 1.5 Mbps to, say, 10 Mbps. This will lower the traffic intensity on the access link to 0.15, which translates to negligible delays between the two routers. In this case, the total response time will roughly be 2 seconds, that is, the Internet delay. But this solution also means that the institution must upgrade its access link from 1.5 Mbps to 10 Mbps, which can be very costly.

Now consider the alternative solution of not upgrading the access link but instead installing a Web cache in the institutional network. This solution is illustrated in Figure 2.11.

Figure 2.11: Adding a cache to the institutional network

Hit rates--the fraction of requests that are satisfied by a cache-- typically range from 0.2 to 0.7 in practice. For illustrative purposes, let us suppose that the cache provides a hit rate of 0.4 for this institution. Because the clients and the cache are connected to the same high-speed LAN, 40 percent of the requests will be satisfied almost immediately, say, within 10 milliseconds, by the cache. Nevertheless, the remaining 60 percent of the requests still need to be satisfied by the origin servers. But with only 60 percent of the requested objects passing through the access link, the traffic intensity on the access link is reduced from 1.0 to 0.6. Typically, a traffic intensity less than 0.8 corresponds to a small delay, say, tens of milliseconds, on a 1.5 Mbps link. This delay is negligible compared with the 2-second Internet delay. Given these considerations, average delay therefore is

0.4 * (0.010 seconds) + 0.6 * (2.01 seconds)

which is just slightly greater than 1.2 seconds. Thus, this second solution provides an even lower response time than the first solution, and it doesn't require the institution to upgrade its link to the Internet. The institution does, of course, have to purchase and install a Web cache. But this cost is low--many caches use public-domain software that run on inexpensive servers and PCs.

Cooperative Caching

Multiple Web caches, located at different places in the Internet, can cooperate and improve overall performance. For example, an institutional cache can be configured to send its HTTP requests to a cache in a backbone ISP at the national level. In this case, when the institutional cache does not have the requested object in its storage, it forwards the HTTP request to the national cache. The national cache then retrieves the object from its own storage or, if the object is not in storage, from the origin server. The national cache then sends the object (within an HTTP response message) to the institutional cache, which in turn forwards the object to the requesting browser. Whenever an object passes through a cache (institutional or national), the cache keeps a copy in its local storage. The advantage of passing through a higher-level cache, such as a national cache, is that it has a larger user population and therefore higher hit rates.

An example of a cooperative caching system is the NLANR caching system, which consists of a number of backbone caches in the United States providing ser vice to institutional and regional caches from all over the globe [NLANR 1999]. The NLANR caching hierarchy is shown in Figure 2.12 [Huffaker 1998]. The caches obtain objects from each other using a combination of HTTP and ICP (Internet Caching Protocol). ICP is an application-layer protocol that allows one cache to quickly ask another cache if it has a given document [RFC 2186]; a cache can then use HTTP to retrieve the object from the other cache. ICP is used extensively in many cooperative caching systems and is fully supported by Squid, a popular public-domain software for Web caching [Squid 2000]. If you are interested in learning more about ICP, you are encouraged to see [Luotonen 1998], [Ross 1998], and the ICP RFC [RFC 2186].

Figure 2.12: The NLANR caching hierarchy (Courtesy of [Huffaker and Jung 1998]

An alternative form of cooperative caching involves clusters of caches, often co-located on the same LAN. A single cache is often replaced with a cluster of caches when the single cache is not sufficient to handle the traffic or provide sufficient storage capacity. Although cache clustering is a natural way to scale as traffic increases, clusters introduce a new problem: When a browser wants to request a particular object, to which cache in the cache cluster should it send the request? This problem can be elegantly solved using hash routing. (If you are not familiar with hash functions, you can read about them in Chapter 7.) In the simplest form of hash routing, the browser hashes the URL, and depending on the result of the hash, the browser directs its request message to one of the caches in the cluster. By having all the browsers use the same hash function, an object will never be present in more than one cache in the cluster, and if the object is indeed in the cache cluster, the browser will always direct its request to the correct cache. Hash routing is the essence of the Cache Array Routing Protocol (CARP). Additional information is available if you are interested in learning more about hash routing or CARP [Valloppillil 1997; Luotonen 1998; Ross 1997, 1998].

Web caching is a rich and complex subject. Web caching has also enjoyed extensive research and product development in recent years. Furthermore, caches are now being built to handle streaming audio and video. Caches will likely play an important role as the Internet begins to provide an infrastructure for the large-scale, on-demand distribution of music, television shows, and movies in the Internet.