8.2 URLs and the HTTP Conversation
In this section we talk a bit
about URLs and the HTTP conversation between the
user's web browser and your Tomcat server. An
understanding of this material will be helpful in diagnosing certain
types of errors and, at the end of the section, we show you several
tools for watching the HTTP conversation; this allows you to pretend
to be a web browser and see exactly how Tomcat is responding.
8.2.1 HTTP Requests
The
recipient of any request is, of course, a URL. A
URL, or Universal
Resource Locator, is the standard form of web address, and
it is understood by all web programs (including your web browser). A
URL consists of a protocol, a host name, an optional port number, a
slash, and an optional resource path.
The first portion of the URL, the protocol, is generally the
HyperText Transport Protocol (HTTP). While there
are several available protocols, HTTP is the network protocol that
the web browser and web server most often use to communicate. The
HTTP request consists of at least one line and usually some
additional header lines. The request line consists of three parts:
the request type (usually GET or
POST), the path and name of the object being
requested (often an HTML file or an image file, but this can also be
a Servlet or JSP, an audio or video file, or almost anything else),
and the highest version number of the HTTP protocol that the browser
is prepared to speak (usually 1.0 or 1.1). If the URL does not
include a filename, the browser must send a /,
which translates to a request for the site's default
page. A simple request might look like this:
GET / HTTP/1.0
|
Since the Web was invented on Unix, the Unix filename conventions are
normally used; hence the use of forward slashes for directory
separators.
|
|
Several
headers will usually follow this request line. These headers are in
the same format as email headers, that is, a keyword, a colon and
space, and a value. The headers must then be followed by a blank
line. If the request is a POST instead of a
GET, the request parameters and their values
follow this empty, or null, line.
One of the most important request headers is
User-Agent, which tells the server what kind of
browser you are using. This is used to generate statistics about how
many people use Mozilla/Netscape versus Internet Explorer, and it is
also used to customize response pages to handle bugs in (or
differences between) browsers. You can learn a lot about your clients
by watching this header; the BrowserHawk product from http://www.cyscape.com makes heavy use of
this particular header and displays quite a bit of useful information
about web browsers.
8.2.2 Response Codes and Headers
The response
line is also in three parts: the HTTP protocol number (echoing back
the HTTP protocol version number that was included in the client
request), a numeric status code, and a brief message. The status code
is a three-digit number indicating success, failure, or any one of
several other conditions. Codes beginning with
"2" mean success. Code 200 is the
most common success indicator and means that the requested file is
being served. Codes beginning with
"3" indicate a non-fatal error; one
of the most common is 302, which means a redirection. Redirections
were invented to allow server maintainers to provide a new location
for a file that has been moved. However, if you
don't give a filename, or if you type a URL with no
trailing slash (such as http://www.oreilly.com or http://www.oreilly.com/catalog), you will get
a redirection from most servers, depending on the
server's configuration. The server redirects the
client to the directory requested, and then to a default file within
that directory (if present). The redirection is necessary for
relative links to work; it is otherwise harmless but causes a brief
delay because the browser has to turn around and request the page
from the new location.
|
Tomcat 4.0 and 4.1 will also send a redirection if you request a URL
ending in /; it will redirect your browser to the
default page for the relevant directory, meaning that
users' bookmarks will refer to the default page
instead of just the directory. This is Bug 11470 in the Tomcat bug
database and will probably not be fixed until Tomcat 5.0.
|
|
There are also error codes: status codes beginning with a
"4" indicate client errors, and
errors beginning with a "5"
indicate server errors. The most common error codes are good old 404,
when a requested file is not found, and 500, the
"catch-all" server error code.
Moving on
from response codes, an important response header is
Content-Type, which specifies the MIME type of the
response. text/html is the most common; see your
$CATALINA_HOME/conf/web.xml file for information
on others. This header's value tells the browser how
to interpret the response data, indicating whether the response is
text, an image file, an audio clip, or any other particular data
format. The browser will use this header in determining whether it
can display the response, or whether it needs to launch another
helper application.
If redirection occurs, there is another important response header:
Location. This header contains the full URL of the
location fielding the request. This location is the new location, not
the originally requested one. There are also several other headers
for cookies, locales, and more.
8.2.3 Interacting with HTTP
Since
we are dealing with a purely textual request and response phase (at
least where HTML is involved), it is possible to listen in on
client-server communication using a Telnet client. Unix systems
provide a command-line Telnet client that is ideal for this purpose,
and the Cygwin package includes a command-line Telnet client for
Windows. You can also use the netcat
(nc) program
to view these requests noninteractively.
Examples Example 8-3, Example 8-4,
and Example 8-5 show several simple HTTP
interactions with various web servers. In each case, the default page
is requested. Examples Example 8-3 and Example 8-4 show Tomcat HTTP requests being made with a
Telnet client, while Example 8-5 demonstrates the
use of netcat.
|
In these examples, lines beginning with # are
comment lines; lines beginning with $ are commands
that we typed to start programs.
|
|
Note that the title tag for a 302 (redirection)
response contains the text "Tomcat Error
Report", which is a little misleading: this is not
an error, but a warning. However, in normal use the browser
doesn't display this text, so the message is
harmless.
Example 8-3. A redirection on Tomcat using telnet
$ telnet localhost 80
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET / HTTP/1.0
HTTP/1.1 302 Moved Temporarily
Content-Type: text/html
Date: Sat, 20 Oct 2001 15:21:35 GMT
Location: http://localhost:8080/index.html
Server: Apache Tomcat/4.0 (HTTP/1.1 Connector)
Connection: close
<html>
<head>
<title>Tomcat Error Report</title>
</head>
<body bgcolor="white">
<br><br>
<h1>HTTP Status 302 - Moved Temporarily</h1>
The requested resource (Moved Temporarily) has moved temporarily to a new location.
</body>
</html>
Connection closed by foreign host.
Example 8-4 shows a request for the
index.html file.
Example 8-4. Requesting index.html on Tomcat using telnet
$ telnet localhost 80
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET /index.html HTTP/1.0
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 2836
Date: Sat, 20 Oct 2001 15:33:00 GMT
Server: Apache Tomcat/4.0 (HTTP/1.1 Connector)
Last-Modified: Fri, 12 Oct 2001 22:36:50 GMT
ETag: "2836-1002926210000"
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="GENERATOR" CONTENT="The vi editor from Unix">
<META NAME="Author" CONTENT="Ian Darwin">
<TITLE>Ian Darwin's Webserver On The Road</TITLE>
<LINK REL="stylesheet" TYPE="text/css" HREF="/stylesheet.css" TITLE="Style">
</HEAD>
<BODY BGCOLOR="#c0d0e0">
<H1>Ian Darwin's Webserver On The Road</H1>
# Rest of the HTML not shown here...
</BODY></HTML>
Notice the 200 OK status message, the
Content-Length header, the
Last-Modified header, and the
Server header. Each has valuable information.
Content-Length is used when the server knows the
exact size of the file that it is sending in response to a request;
Last-Modified lets the client know the last time
that the requested file was modified; and Server
indicates what server software is responding to the request.
nc is a general-purpose program for connecting
to sockets. It is similar to a Telnet client, but it is easier to
script. Example 8-5 shows nc
connecting to Tomcat.
Example 8-5. Using nc to talk to Tomcat
$ (echo GET / HTTP/1.0; echo "") | nc localhost 80
HTTP/1.1 302 Moved Temporarily
Content-Type: text/html
Date: Sat, 20 Oct 2001 15:21:47 GMT
Location: http://localhost:8080/index.html
Server: Apache Tomcat/4.0 (HTTP/1.1 Connector)
Connection: close
<html>
<head>
<title>Tomcat Error Report</title>
</head>
<body bgcolor="white">
<br><br>
<h1>HTTP Status 302 - Moved Temporarily</h1>
The requested resource (Moved Temporarily) has moved temporarily to a new location.
</body>
</html>
You've now seen the basics of interacting with the
server from a browser's point of view. Of course,
the web browser concept was invented by Tim Berners-Lee to avoid
users having to perform this kind of interaction, but as an
administrator you should know what happens under the hood to better
understand both the web browser and web server, and to be able to
diagnose HTTP request and response problems.
|
No comments:
Post a Comment