| Along with
the Web, electronic mail is one of the most popular Internet applications.
Just like ordinary "snail mail," e-mail is asynchronous--people send and
read messages when it is convenient for them, without having to coordinate
with other peoples' schedules. In contrast with snail mail, electronic
mail is fast, easy to distribute, and inexpensive. Moreover, modern electronic
mail messages can include hyperlinks, HTML formatted text, images, sound,
and even video. In this section we will examine the application-layer protocols
that are at the heart of Internet electronic mail. But before we jump into
an in-depth discussion of these protocols, let's take a bird's eye view
of the Internet mail system and its key components.
Figure 2.15
presents a high-level view of the Internet mail system. We see from this
diagram that it has three major components: user agents, mail servers,
and the Simple Mail Transfer Protocol (SMTP). We now describe each of these
components in the context of a sender, Alice, sending an e-mail message
to a recipient, Bob. User agents allow users to read, reply to, forward,
save, and compose messages. (User agents for electronic mail are sometimes
called mail readers, although we will generally avoid this term
in this book.) When Alice is finished composing her message, her user agent
sends the message to her mail server, where the message is placed in the
mail server's outgoing message queue. When Bob wants to read a message,
his user agent obtains the message from his mailbox in his mail server.
In the late 1990s, GUI (graphical user interface) user agents became popular,
allowing users to view and compose multimedia messages. Currently, Eudora,
Microsoft's Outlook, and Netscape's Messenger are among the popular GUI
user agents for e-mail. There are also many text-based e-mail user interfaces
in the public domain, including mail, pine, and elm.
Figure 2.15:
A bird's eye view of the Internet e-mail system
Mail servers
form the core of the e-mail infrastructure. Each recipient, such as Bob,
has a mailbox located in one of the mail servers. Bob's mailbox manages
and maintains the messages that have been sent to him. A typical message
starts its journey in the sender's user agent, travels to the sender's
mail server, and then travels to the recipient's mail server, where it
is deposited in the recipient's mailbox. When Bob wants to access the messages
in his mailbox, the mail server containing the mailbox authenticates Bob
(with user names and passwords). Alice's mail server must also deal with
failures in Bob's mail server. If Alice's server cannot deliver mail to
Bob's server, Alice's server holds the message in a message queue and attempts
to transfer the message later. Reattempts are often done every 30 minutes
or so; if there is no success after several days, the server removes the
message and notifies the sender (Alice) with an e-mail message.
The Simple Mail
Transfer Protocol (SMTP) is the principal application-layer protocol for
Internet electronic mail. It uses the reliable data transfer service of
TCP to transfer mail from the sender's mail server to the recipient's mail
server. As with most application-layer protocols, SMTP has two sides: a
client side, which executes on the sender's mail server, and a server side,
which executes on the recipient's mail server. Both the client and server
sides of SMTP run on every mail server. When a mail server sends mail (to
other mail servers), it acts as an SMTP client. When a mail server receives
mail (from other mail servers) it acts as an SMTP server.
Hotmail
In December 1995, Sabeer Bhatia and Jack Smith visited
the Internet venture capitalist Draper Fisher Jurvetson and proposed developing
a free Web-based e-mail system. The idea was to give a free e-mail account
to anyone who wanted one, and to make the accounts accessible from the
Web. With Web-based e-mail, anyone with access to the Web--say, from a
school or community library--could read and send e-mails. Furthermore,
web-based e-mail would offer great mobility to its subscribers. In exchange
for 15 percent of the company, Draper Fisher Jurvetson financed Bhatia
and Smith, who formed a company called Hotmail. With three full-time people
and 12 to 14 part-time people who worked for stock options, they were able
to develop and launch the service in July 1996. Within a month after launch
they had 100,000 subscribers. The number of subscribers continued to grow
rapidly, with all of their subscribers being exposed to advertising banners
while reading their e-mail. In December 1997, less than 18 months after
launching the service, Hotmail had over 12 million subscribers and was
acquired by Microsoft, reportedly for $400 million dollars.
The success of Hotmail is often attributed to its "first-mover
advantage" and to the inherent "viral marketing" of e-mail. Hotmail had
a first-mover advantage because it was the first company to offer Web-based
e-mail. Other companies, of course, copied Hotmail's idea, but Hotmail
had a six-month lead on them. The coveted first-mover advantage is obtained
by having an original idea, and then developing quickly and secretly. A
service or a product is said to have viral marketing if it markets itself.
E-mail is a classic example of a service with viral marketing--the sender
sends a message to one or more recipients, and then all the recipients
become aware of the service. Hotmail demonstrated that the combination
of first-mover advantage and viral marketing can produce a killer application.
Perhaps some of the students reading this book will be among the new entrepreneurs
who conceive and develop first-mover Internet services with inherent viral
marketing. |
2.4.1: SMTP
SMTP, defined in
RFC 821, is at the heart of Internet electronic mail. As mentioned above,
SMTP transfers messages from senders' mail servers to the recipients' mail
servers. SMTP is much older than HTTP. (The SMTP RFC dates back to 1982,
and SMTP was around long before that.) Although SMTP has numerous wonderful
qualities, as evidenced by its ubiquity in the Internet, it is nevertheless
a legacy technology that possesses certain "archaic" characteristics. For
example, it restricts the body (not just the headers) of all mail messages
to be in simple seven-bit ASCII. This restriction made sense in the early
1980s when transmission capacity was scarce and no one was e-mailing large
attachments or large image, audio, or video files. But today, in the multimedia
era, the seven-bit ASCII restriction is a bit of a pain--it requires binary
multimedia data to be encoded to ASCII before being sent over SMTP; and
it requires the corresponding ASCII message to be decoded back to binary
after SMTP transport. Recall from Section 2.3 that HTTP does not require
multimedia data to be ASCII encoded before transfer.
To illustrate
the basic operation of SMTP, let's walk through a common scenario. Suppose
Alice wants to send Bob a simple ASCII message:
-
Alice invokes her
user agent for e-mail, provides Bob's e-mail address (for example, bob@someschool.edu),
composes a message, and instructs the user agent to send the message.
-
Alice's user agent
sends the message to her mail server, where it is placed in a message queue.
-
The client side
of SMTP, running on Alice's mail server, sees the message in the message
queue. It opens a TCP connection to an SMTP server, running on Bob's mail
server.
-
After some initial
SMTP handshaking, the SMTP client sends Alice's message into the TCP connection.
-
At Bob's mail server
host, the server side of SMTP receives the message. Bob's mail server then
places the message in Bob's mailbox.
-
Bob invokes his
user agent to read the message at his convenience.
The scenario is
summarized in Figure 2.16.
Figure 2.16:
Alice's mail server transfers Alice's message to Bob's mail server
It is important
to observe that SMTP does not normally use intermediate mail servers for
sending mail, even when the two mail servers are located at opposite ends
of the world. If Alice's server is in Hong Kong and Bob's server is in
Mobile, Alabama, the TCP "connection" is a direct connection between the
Hong Kong and Mobile servers. In particular, if Bob's mail server is down,
the message remains in Alice's mail server and waits for a new attempt--the
message does not get placed in some intermediate mail server.
Let's now take
a closer look at how SMTP transfers a message from a sending mail server
to a receiving mail server. We will see that the SMTP protocol has many
similarities with protocols that are used for face-to-face human interaction.
First, the client SMTP (running on the sending mail server host) has TCP
establish a connection on port 25 to the server SMTP (running on the receiving
mail server host). If the server is down, the client tries again later.
Once this connection is established, the server and client perform some
application-layer handshaking. Just as humans often introduce themselves
before transferring information from one to another, SMTP clients and servers
introduce themselves before transferring information. During this SMTP
handshaking phase, the SMTP client indicates the e-mail address of the
sender (the person who generated the message) and the e-mail address of
the recipient. Once the SMTP client and server have introduced themselves
to each other, the client sends the message. SMTP can count on the reliable
data transfer service of TCP to get the message to the server without errors.
The client then repeats this process over the same TCP connection if it
has other messages to send to the server; otherwise, it instructs TCP to
close the connection.
Let us take
a look at an example transcript between client (C) and server (S). The
host name of the client is crepes.fr and the host name of the
server is hamburger.edu. The ASCII text lines prefaced with C:
are exactly the lines the client sends into its TCP socket, and the ASCII
text lines prefaced with S: are exactly the lines the server
sends into its TCP socket. The following transcript begins as soon as the
TCP connection is established:
S: 220 hamburger.edu
C: HELO crepes.fr
S: 250 Hello crepes.fr, pleased to meet you
C: MAIL FROM: <alice@crepes.fr>
S: 250 alice@crepes.fr... Sender ok
C: RCPT TO: <bob@hamburger.edu>
S: 250 bob@hamburger.edu ... Recipient ok
C: DATA
S: 354 Enter mail, end with "." on a line by itself
C: Do you like ketchup?
C: How about pickles?
C: .
S: 250 Message accepted for delivery
C: QUIT
S: 221 hamburger.edu closing connection
In the above example,
the client sends a message ("Do you like ketchup? How about pickles?")
from mail server crepes.fr to mail server hamburger.edu.
The client issued five commands: HELO (an abbreviation for HELLO),
MAIL FROM, RCPT TO, DATA, and QUIT.
These commands are self explanatory. The server issues replies to each
command, with each reply having a reply code and some (optional) English-language
explanation. We mention here that SMTP uses persistent connections: If
the sending mail server has several messages to send to the same receiving
mail server, it can send all of the messages over the same TCP connection.
For each message, the client begins the process with a new HELO crepes.fr
and only issues QUIT after all messages have been sent.
It is highly
recommended that you use Telnet to carry out a direct dialogue with an
SMTP server. To do this, issue telnet serverName 25 where serverName
is the name of the remote mail server. When you do this, you are simply
establishing a TCP connection between your local host and the mail server.
After typing this line, you should immediately receive the 220
reply from the server. Then issue the SMTP commands HELO, MAIL
FROM, RCPT TO, DATA, and QUIT at the appropriate
times. If you Telnet into your friend's SMTP server, you should be able
to send mail to your friend in this manner (that is, without using your
mail user agent).
2.4.2: Comparison
with HTTP
Let us now briefly
compare SMTP to HTTP. Both protocols are used to transfer files from one
host to another; HTTP transfers files (or objects) from Web server to Web
user agent (that is, the browser); SMTP transfers files (that is, e-mail
messages) from one mail server to another mail server. When transferring
the files, both persistent HTTP and SMTP use persistent connections. Thus,
the two protocols have common characteristics. However, there are important
differences. First, HTTP is principally a pull protocol--someone loads
information on a Web server and users use HTTP to pull the information
from the server at their convenience. In particular, the TCP connection
is initiated by the machine that wants to receive the file. On the other
hand, SMTP is primarily a push protocol--the sending mail server pushes
the file to the receiving mail server. In particular, the TCP connection
is initiated by the machine that wants to send the file.
A second important
difference, which we alluded to earlier, is that SMTP requires each message,
including the body of each message, to be in seven-bit ASCII format. Furthermore,
the SMTP RFC requires the body of every message to end with a line consisting
of only a period--that is, in ASCII jargon, the body of each message ends
with "CRLF.CRLF," where CR and LF stand for carriage return and
line feed, respectively. In this manner, while the SMTP server is receiving
a series of messages from an SMTP client, the server can delineate the
messages by searching for "CRLF.CRLF" in the byte stream. Now
suppose that the body of one of the messages is not ASCII text but instead
binary data (for example, a JPEG image). It is possible that this binary
data might accidentally have the bit pattern associated with ASCII representation
of "CRLF.CRLF" in the middle of the bit stream. This would cause
the SMTP server to incorrectly conclude that the message has terminated.
To get around this and related problems, binary data is first encoded to
ASCII in such a way that certain ASCII characters (including " . ") are
not used. Returning to our comparison with HTTP, we note that neither nonpersistent
nor persistent HTTP has to bother with the ASCII conversion. For nonpersistent
HTTP, each TCP connection transfers exactly one object; when the server
closes the connection, the client knows it has received one entire response
message. For persistent HTTP, each response message includes a Content-length:
header line, enabling the client to delineate the end of each message.
A third important
difference concerns how a document consisting of text and images (along
with possibly other media types) is handled. As we learned in Section 2.3,
HTTP encapsulates each object in its own HTTP response message. Internet
mail, as we shall discuss in greater detail below, places all of the message's
objects into one message.
2.4.3: Mail Message
Formats and MIME
When Alice sends
an ordinary snail-mail letter to Bob, she puts the letter into an envelope,
on which there are all kinds of peripheral information such as Bob's address,
Alice's return address, and the date (supplied by the postal service).
Similarly, when an e-mail message is sent from one person to another, a
header containing peripheral information precedes the body of the message
itself. This peripheral information is contained in a series of header
lines, which are defined in RFC 822. The header lines and the body of the
message are separated by a blank line (that is, by CRLF). RFC
822 specifies the exact format for mail header lines as well as their semantic
interpretations. As with HTTP, each header line contains readable text,
consisting of a keyword followed by a colon followed by a value. Some of
the keywords are required and others are optional. Every header must have
a From: header line and a To: header line; a header may
include a Subject: header line as well as other optional header
lines. It is important to note that these header lines are different
from the SMTP commands we studied in Section 2.4.1 (even though they contain
some common words such as "from" and "to"). The commands in that section
were part of the SMTP handshaking protocol; the header lines examined in
this section are part of the mail message itself.
A typical message
header looks like this:
From: alice@crepes.fr
To: bob@hamburger.edu
Subject: Searching for the meaning of life.
After the message
header, a blank line follows, then the message body (in ASCII) follows.
The message terminates with a line containing only a period, as discussed
above. You should use Telnet to send to a mail server a message that contains
some header lines, including the Subject: header line. To do this,
issue telnet serverName 25.
The MIME
Extension for Non-ASCII Data
While the message
headers described in RFC 822 are satisfactory for sending ordinary ASCII
text, they are not sufficiently rich enough for multimedia messages (for
example, messages with images, audio, and video) or for carrying non-ASCII
text formats (for example, characters used by languages other than English).
To send content different from ASCII text, the sending user agent must
include additional headers in the message. These extra headers are defined
in RFC 2045 and RFC 2046, the MIME (Multipurpose Internet Mail Extensions)
extension to RFC 822.
Two key MIME
headers for supporting multimedia are the Content-Type: header
and the Content-Transfer-Encoding: header. The Content-Type:
header allows the receiving user agent to take an appropriate action
on the message. For example, by indicating that the message body contains
a JPEG image, the receiving user agent can direct the message body to a
JPEG decompression routine. To understand the need of the Content-Transfer-Encoding:
header, recall that non-ASCII text messages must be encoded to an ASCII
format that isn't going to confuse SMTP. The Content-Transfer-Encoding:
header alerts the receiving user agent that the message body has been ASCII-encoded
and to the type of encoding used. Thus, when a user agent receives a message
with these two headers, it first uses the value of the Content-Transfer-Encoding:
header to convert the message body to its original non-ASCII form, and
then uses the Content-Type: header to determine what actions it
should take on the message body.
Let's take a
look at a concrete example. Suppose Alice wants to send Bob a JPEG image.
To do this, Alice invokes her user agent for e-mail, specifies Bob's e-mail
address, specifies the subject of the message, and inserts the JPEG image
into the message body of the message. (Depending on the user agent Alice
uses, she might insert the image into the message as an "attachment.")
When Alice finishes composing her message, she clicks on "Send." Alice's
user agent then generates a MIME message, which might look something like
this:
From: alice@crepes.fr
To: bob@hamburger.edu
Subject: Picture of yummy crepe.
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Type: image/jpeg
(base64 encoded data .....
.........................
......base64 encoded data)
We observe from
the above MIME message that Alice's user agent encoded the JPEG image using
base64 encoding. This is one of several encoding techniques standardized
in the MIME [RFC
2045] for conversion to an acceptable seven-bit ASCII format. Another
popular encoding technique is quoted-printable content-transfer-encoding,
which is typically used to convert an ordinary ASCII message to ASCII text
void of undesirable character strings (for example, a line with a single
period).
When Bob reads
his mail with his user agent, his user agent operates on this same MIME
message. When Bob's user agent observes the Content-Transfer- Encoding:
base64 header line, it proceeds to decode the base64-encoded message
body. The message also includes a Content-Type: image/jpeg header
line; this indicates to Bob's user agent that the message body should be
JPEG decompressed. Finally, the message includes the MIME-Version:
header, which, of course, indicates the MIME version that is being used.
Note that the message otherwise follows the standard RFC 822/SMTP format.
In particular, after the message header there is a blank line and then
the message body; and after the message body, there is a line with a single
period.
Let's now take
a closer look at the Content-Type: header. According to the MIME
specification [RFC
2046], this header has the following format:
Content-Type:
type/subtype; parameters
where the "parameters"
(along with the semicolon) is optional. Paraphrasing [RFC
2046], the Content-Type field is used to specify the nature
of the data in the body of a MIME entity, by giving media type and subtype
names. After the type and subtype names, the remainder of the header field
consists of a set of parameters. In general, the top-level type is used
to declare the general type of data, whereas the subtype specifies a specific
format for that type of data. The parameters are modifiers of the subtype
and as such do not fundamentally affect the nature of the content. The
set of meaningful parameters depends on the type and subtype. Most parameters
are associated with a single specific subtype. MIME has been carefully
designed to be extensible, and it is expected that the set of media type/subtype
pairs and their associated parameters will grow significantly over time.
In order to ensure that the set of such types/subtypes is developed in
an orderly, well-specified, and public manner, MIME sets up a registration
process that uses the Internet Assigned Numbers Authority (IANA) as a central
registry for MIME's various areas of extensibility. The registration process
for these areas is described in RFC 2048.
Currently there
are seven top-level types defined. For each type, there is a list of associated
subtypes, and the lists of subtypes are growing every year. We describe
five of these types below:
-
text: The text
type is used to indicate to the receiving user agent that the message body
contains textual information. One extremely common type/subtype pair is
text/plain. The subtype plain indicates plain text containing
no formatting commands or directives. Plain text is to be displayed as
is; no special software is required to get the full meaning of the text,
aside from support for the indicated character set. If you take a glance
at the MIME headers in some of the messages in your mailbox, you will almost
certainly see content type header lines with text/plain; charset=us-ascii
or text/plain; charset="ISO-8859-1". The parameters indicate
the character set used to generate the message. Another type/subtype pair
that is gaining popularity is text/html. The html subtype
indicates to the mail reader that it should interpret the embedded HTML
tags that are included in the message. This allows the receiving user agent
to display the message as a Web page, which might include a variety of
fonts, hyperlinks, applets, and so on.
-
image: The image
type is used to indicate to the receiving user agent that the message body
is an image. Two popular type/subtype pairs are image/gif and
image/jpeg. When the receiving user agent encounters image/gif,
it knows that it should decode the GIF image and then display it.
-
audio: The audio
type requires an audio output device (such as a speaker or a telephone)
to render the contents. Some of the standardized subtypes include basic
(basic eight-bit
-law encoded) and 32kadpcm
(a 32 Kbps format defined in RFC 1911).
-
video: The video
type includes mpeg, and quicktime for subtypes.
-
application: The
application type is for data that does not fit in any of the other categories.
It is often used for data that must be processed by an application before
it is viewable or usable by a user. For example, when a user attaches a
Microsoft Word document to an e-mail message, the sending user agent typically
uses application/msword for the type/subtype pair. When the receiving
user agent observes the content type application/msword, it launches
the Microsoft Word application and passes the body of the MIME message
to the application. A particularly important subtype for the application
type is octet-stream, which is used to indicate that the body
contains arbitrary binary data. Upon receiving this type, a mail reader
will prompt the user, providing the option to save the message to disk
for later processing.
There is one MIME
type that is particularly important and requires special discussion, namely,
the multipart type. Just as a Web page can contain many objects (such as
text, images, applets), so too can an e-mail message. Recall that the Web
sends each of the objects within independent HTTP response messages. Internet
e-mail, on the other hand, places all the objects (or "parts") in the same
message. In particular, when a multimedia message contains more than one
object (such as multiple images or some ASCII text and some images) the
message typically has Content-type: multipart/mixed. This content
type header line indicates to the receiving user agent that the message
contains multiple objects. With all the objects in the same message, the
receiving user agent needs a means to determine (1) where each object begins
and ends, (2) how each non-ASCII object was transfer-encoded, and (3) the
content type of each message. This is done by placing boundary characters
between each object and preceding each object in the message with Content-type:
and Content-Transfer-Encoding: header lines.
To obtain a
better understanding of multipart/mixed, let's look at an example.
Suppose that Alice wants to send a message to Bob consisting of some ASCII
text, followed by a JPEG image, followed by more ASCII text. Using her
user agent, Alice types some text, attaches a JPEG image, and then types
some more text. Her user agent then generates a message something like
this:
From: alice@crepes.fr
To: bob@hamburger.edu
Subject: Picture of yummy crepe with commentary
MIME-Version: 1.0
Content-Type: multipart/mixed; Boundary=StartOfNextPart
--StartOfNextPart
Dear Bob,
Please find a picture of an absolutely scrumptious crepe.
--StartOfNextPart
Content-Transfer-Encoding: base64
Content-Type: image/jpeg
base64 encoded data .....
.........................
......base64 encoded data
--StartOfNextPart
Let me know if you would like the recipe.
Examining the above
message, we note that the Content-Type: line in the header indicates
how the various parts in the message are separated. The separation always
begins with two dashes and ends with CRLF.
The Received
Message
As we have discussed,
an e-mail message consists of many components. The core of the message
is the message body, which is the actual data being sent from sender to
receiver. For a multipart message, the message body itself consists of
many parts, with each part preceded by one or more lines of identifying
information. Preceding the message body is a blank line and then a number
of header lines. These header lines include RFC 822 header lines such as
From:, To:, and Subject: header lines. The header
lines also include MIME header lines such as Content-type: and
Content-transfer-encoding: header lines. But we would be remiss
if we didn't mention another class of header lines that are inserted by
the SMTP receiving server. Indeed, the receiving server, upon receiving
a message with RFC 822 and MIME header lines, appends a Received:
header line to the top of the message; this header line specifies the name
of the SMTP server that sent the message ("from"), the name of the SMTP
server that received the message ("by") and the time at which the receiving
server received the message. Thus, the message seen by the destination
user takes the following form:
Received: from crepes.fr by hamburger.edu; 12 Oct 98 15:27:39 GMT
From: alice@crepes.fr
To: bob@hamburger.edu
Subject: Picture of yummy crepe.
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Type: image/jpeg
base64 encoded data .......
........................................
.......base64 encoded data
Almost everyone
who has used electronic mail has seen the Received: header line
(along with the other header lines) preceding e-mail messages. (This line
is often directly seen on the screen or when the message is sent to a printer.)
You may have noticed that a single message sometimes has multiple Received:
header lines and a more complex Return-Path: header line. This
is because a message may be forwarded to more than one SMTP server in the
path between sender and recipient. For example, if Bob has instructed his
e-mail server hamburger.edu to forward all his messages to sushi.jp,
then the message read by Bob's user agent would begin with something like:
Received: from hamburger.edu by sushi.jp; 12 Oct 98 15:30:01 GMT
Received: from crepes.fr by hamburger.edu; 12 Oct 98 15:27:39 GMT
These header lines
provide the receiving user agent a trace of the SMTP servers visited as
well as timestamps of when the visits occurred. You can learn more about
the syntax of these header lines in the SMTP RFC, which is one of the more
readable of the many RFCs.
2.4.4: Mail Access
Protocols
Once SMTP delivers
the message from Alice's mail server to Bob's mail server, the message
is placed in Bob's mailbox. Throughout this discussion we have tacitly
assumed that Bob reads his mail by logging onto the server host and then
executes a mail reader directly on that host. Up until the early 1990s
this was the standard way of doing things. But today a typical user reads
mail with a user agent that executes on his or her local PC (or Mac), whether
that PC be an office PC, a home PC, or a portable PC. By executing the
user agent on a local PC, users enjoy a rich set of features, including
the ability to view multimedia messages and attachments.
Given that Bob
(the recipient) executes his user agent on his local PC, it is natural
to consider placing a mail server on his local PC as well. There is a problem
with this approach, however. Recall that a mail server manages mailboxes
and runs the client and server sides of SMTP. If Bob's mail server were
to reside on his local PC, then Bob's PC would have to remain constantly
on, and connected to the Internet, in order to receive new mail, which
can arrive at any time. This is impractical for the great majority of Internet
users. Instead, a typical user runs a user agent on the local PC but accesses
a mailbox from a shared mail server--a mail server that is always connected
to the Internet, and that is shared with other users. The mail server is
typically maintained by the user's ISP (e.g., university or company).
With user agents
running on users' local PCs and mail servers hosted by ISPs or institutional
networks, a protocol is needed to allow the user agent and the mail server
to communicate. Let us first consider how a message that originates at
Alice's local PC makes its way to Bob's SMTP mail server. This task could
simply be done by having Alice's user agent communicate directly with Bob's
mail server in the language of SMTP. Alice's user agent would initiate
a TCP connection to Bob's mail server, issue the SMTP handshaking commands,
upload the message with the DATA command, and then close the connection.
This approach, although perfectly feasible, is not commonly employed, primarily
because it doesn't offer Alice's user agent any recourse to a crashed destination
mail server. Instead, Alice's user agent initiates a SMTP dialogue to upload
Alice's message to her own mail server (rather than with the recipient's
mail server). Alice's mail server then establishes a new SMTP session with
Bob's mail server and relays the message to Bob's mail server. If
Bob's mail server is down, then Alice's mail server holds the message and
tries again later. The SMTP RFC defines how the SMTP commands can be used
to relay a message across multiple SMTP servers.
But there is
still one missing piece to the puzzle! How does a recipient like Bob, running
a user agent on his local PC, obtain his messages, which are sitting on
a mail server within Bob's ISP? The puzzle is completed by introducing
a special mail access protocol that transfers messages from Bob's mail
server to his local PC. There are currently two popular mail access protocols:
POP3 (Post Office Protocol-- Version 3) and IMAP (Internet Mail Access
Protocol). We shall discuss both of these protocols below. Note that Bob's
user agent can't use SMTP to obtain the messages because obtaining the
messages is a pull operation whereas SMTP is a push protocol. Figure 2.17
provides a summary of the protocols that are used for Internet mail: SMTP
is used to transfer mail from the sender's mail server to the recipient's
mail server; SMTP is also used to transfer mail from the sender's user
agent to the sender's mail server. POP3 or IMAP are used to transfer mail
from the recipient's mail server to the recipient's user agent.
Figure 2.17:
E-mail protocols and their communicating entities
POP3
POP3, defined
in RFC 1939, is an extremely simple mail access protocol. Because the protocol
is so simple, its functionality is rather limited. POP3 begins when the
user agent (the client) opens a TCP connection to the mail server (the
server) on port 110. With the TCP connection established, POP3 progresses
through three phases: authorization, transaction, and update. During the
first phase, authorization, the user agent sends a user name and a password
to authenticate the user downloading the mail. During the second phase,
transaction, the user agent retrieves messages. During the transaction
phase, the user agent can also mark messages for deletion, remove deletion
marks, and obtain mail statistics. The third phase, update, occurs after
the client has issued the quit command, ending the POP3 session;
at this time, the mail server deletes the messages that were marked for
deletion.
In a POP3 transaction,
the user agent issues commands, and the server responds to each command
with a reply. There are two possible responses: +OK (sometimes
followed by server-to-client data), whereby the server is saying that the
previous command was fine; and -ERR, whereby the server is saying
that something was wrong with the previous command.
The authorization
phase has two principle commands: user <user name> and pass
<password>. To illustrate these two commands, we suggest
that you Telnet directly into a POP3 server, using port 110, and issue
these commands. Suppose that mailServer is the name of your mail
server. You will see something like:
telnet mailServer 110
+OK POP3 server ready
user alice
+OK
pass hungry
+OK user successfully logged on
If you misspell
a command, the POP3 server will reply with an -ERR message.
Now let's take
a look at the transaction phase. A user agent using POP3 can often be configured
(by the user) to "download and delete" or to "download and keep."
The sequence of commands issued by a POP3 user agent depend on which of
these two modes the user agent is operating in. In the download-and-delete
mode, the user agent will issue the list, retr, and dele
commands. As an example, suppose the user has two messages in his or her
mailbox. In the dialogue below, C: (standing for client),
is the user agent and S: (standing for server) is the mail
server. The transaction will look something like:
C: list
S: 1 498
S: 2 912
S: .
C: retr 1
S: (blah blah ...
S: .................
S: ..........blah)
S: .
C: dele 1
C: retr 2
S: (blah blah ...
S: .................
S: ..........blah)
S: .
C: dele 2
C: quit
S: +OK POP3 server signing off
The user agent
first asks the mail server to list the size of each of the stored messages.
The user agent then retrieves and deletes each message from the server.
Note that after the authorization phase, the user agent employed only four
commands: list, retr, dele, and quit.
The syntax for these commands is defined in RFC 1939. After processing
the quit command, the POP3 server enters the update phase and removes messages
1 and 2 from the mailbox.
A problem with
this download-and-delete mode is that the recipient, Bob, may be nomadic
and want to access his mail from multiple machines, including the office
PC, the home PC, and a portable computer. The download-and-delete mode
scatters Bob's mail over all of his machines; in particular, if Bob first
reads a message on a home PC, he will not be able to reread the message
on his portable later in the evening. In the download-and-keep mode, the
user agent leaves the messages on the mail server after downloading them.
In this case, Bob can reread messages from different machines; he can access
a message from work, and then access it again later in the week from home.
During a POP3
session between a user agent and the mail server, the POP3 server maintains
some state information; in particular, it keeps track of which messages
have been marked deleted. However, the POP3 server does not carry state
information across POP3 sessions. For example, no message is marked for
deletion at the beginning of each session. This lack of state information
across sessions greatly simplifies the implementation of a POP3 server.
IMAP
Once Bob has
downloaded his messages to the local machine using POP3, he can create
mail folders and move the downloaded messages into the folders. Bob can
then delete messages, move messages across folders, and search for messages
(by sender name or subject). But this paradigm--folders and messages in
the local machine--poses a problem for the nomadic user, who would prefer
to maintain a folder hierarchy on a remote server that can be accessed
from any computer. This is not possible with POP3.
To solve this
and other problems, the Internet Mail Access Protocol (IMAP), defined in
RFC 2060, was invented. Like POP3, IMAP is a mail access protocol. It has
many more features than POP3, but it is also significantly more complex.
(And thus the client and server side implementations are significantly
more complex.) IMAP is designed to allow users to manipulate remote mailboxes
as if they were local. In particular, IMAP enables Bob to create and maintain
multiple message folders at the mail server. Bob can put messages in folders
and move messages from one folder to another. IMAP also provides commands
that allow Bob to search remote folders for messages matching specific
criteria. One reason why an IMAP implementation is much more complicated
than a POP3 implementation is that the IMAP server must maintain a folder
hierarchy for each of its users. This state information persists across
a particular user's successive accesses to the IMAP server. Recall that
a POP3 server, by contrast, does not maintain anything about a particular
user once the user quits the POP3 session.
Another important
feature of IMAP is that it has commands that permit a user agent to obtain
components of messages. For example, a user agent can obtain just the message
header of a message or just one part of a multipart MIME message. This
feature is useful when there is a low-bandwidth connection between the
user agent and its mail server, for example, a wireless or slow-speed modem
connection. With a low-bandwidth connection, the user may not want to download
all the messages in its mailbox, particularly avoiding long messages that
might contain, for example, an audio or video clip.
An IMAP session
consists of the establishment of a connection between the client (that
is, the user agent) and the IMAP server, an initial greeting from the server,
and client/server interactions. The client/server interactions are similar
to, but richer than, those of POP3. They consist of a client command, server
data, and a server completion result response. The IMAP server is always
in one of four states. In the nonauthenticated state, the initial
state when the connection begins, the user must supply a user name and
password before most commands will be permitted. In the authenticated
state, the user must select a folder before sending commands that affect
messages. In the selected state, the user can issue commands that
affect messages (retrieve, move, delete, retrieve a part in a multipart
message, and so on). Finally, the logout state is when the session
is being terminated. The IMAP commands are organized by the state in which
the command is permitted. You can read all about IMAP at the official IMAP
site [IMAP
1999].
HTTP
More and more
users today are using browser-based e-mail services such as Hotmail or
Yahoo! Mail. With these servers, the user agent is an ordinary Web browser
and the user communicates with its mailbox on its mail server via HTTP.
When a recipient, such as Bob, wants to access the messages in his mailbox,
the messages are sent from Bob's mail server to Bob's browser using the
HTTP protocol rather than the POP3 or IMAP protocol. When a sender with
an account on an HTTP-based e-mail server, such as Alice, wants to send
a message, the message is sent from her browser to her mail server over
HTTP rather than over SMTP. The mail server, however, still sends messages
to, and receives messages from, other mail servers using SMTP. This solution
to mail access is enormously convenient for the user on the go. The user
need only to be able to access a browser in order to send and receive messages.
The browser can be in an Internet cafe, in a friend's house, in a hotel
room with a Web TV, and so on. As with IMAP, users can organize their messages
in a hierarchy of folders on the remote server. In fact, Web-based e-mail
is so convenient that it may replace POP3 and IMAP access in the upcoming
years. Its principle disadvantage is that it can be slow, as the server
is typically far from the client and interaction with the server is done
through CGI scripts.
2.4.5: Continuous-media
E-mail
Continuous-media
(CM) e-mail is e-mail that includes audio or video. CM e-mail is appealing
for asynchronous communication among friends and family. For example, a
young child who cannot type would prefer to send an audio message to his
or her grandparents. Furthermore, CM e-mail can be desirable in many corporate
contexts, as an office worker may be able to record a CM message more quickly
than typing a text message. (English can be spoken at a rate of 180 words
per minute, whereas the average office worker types at a much slower rate.)
Continuous-media e-mail resembles ordinary telephone voice-mail messaging
in some respects. However, continuous-media e-mail is much more powerful.
Not only does it provide the user with a graphical interface to the user's
mailbox, but it also allows the user to annotate and reply to CM messages
and to forward CM messages to a large number of recipients.
CM e-mail differs
from traditional text mail in many ways. CM e-mail can have much larger
messages, more stringent end-to-end delay requirements, and greater sensitivity
to recipients with highly heterogeneous Internet access rates and local
storage capabilities. Unfortunately, the current e-mail infrastructure
has several inadequacies that obstruct the widespread adoption of CM e-mail
[Turner
1999]. First, many existing mail servers do not have the capacity to
store large CM objects; recipient mail servers typically reject such messages,
which makes sending CM messages to such recipients impossible. Second,
the existing mail paradigm of transporting entire messages to the
recipient's mail server before recipient rendering can lead to excessive
waste of bandwidth and storage. Indeed, stored CM is often not rendered
in its entirety [Padhye
1999], so that bandwidth and recipient storage is wasted by receiving
data that is never rendered. (For example, one can imagine listening to
the first 15 seconds of a long audio e-mail from a rather long-winded colleague
and then deciding to delete the remaining 20 minutes of the message without
listening to it.) Third, current mail access protocols (POP3, IMAP, and
HTTP) are inappropriate for streaming CM to recipients. (Streaming CM is
discussed in detail in Chapter 6.) In particular, the current mail access
protocols do not provide functionality that allows a user to pause/resume
a message or to reposition within a message; furthermore, streaming over
TCP often leads to poor reception (see Chapter 6). These inadequacies will
hopefully be addressed in the upcoming years. Possible solutions are discussed
in the literature [Gay
1997; Hess
1998; Schurmann
1996; and Turner
1999]. |