Tuesday, October 27, 2009

8.2 URLs and the HTTP Conversation








 

 












8.2 URLs and the HTTP Conversation





In this section we talk a bit

about URLs and the HTTP conversation between the

user's web browser and your Tomcat server. An

understanding of this material will be helpful in diagnosing certain

types of errors and, at the end of the section, we show you several

tools for watching the HTTP conversation; this allows you to pretend

to be a web browser and see exactly how Tomcat is responding.







8.2.1 HTTP Requests





The

recipient of any request is, of course, a URL. A

URL, or Universal

Resource Locator
, is the standard form of web address, and

it is understood by all web programs (including your web browser). A

URL consists of a protocol, a host name, an optional port number, a

slash, and an optional resource path.





The first portion of the URL, the protocol, is generally the

HyperText Transport Protocol (HTTP). While there

are several available protocols, HTTP is the network protocol that

the web browser and web server most often use to communicate. The

HTTP request consists of at least one line and usually some

additional header lines. The request line consists of three parts:

the request type (usually GET or

POST), the path and name of the object being

requested (often an HTML file or an image file, but this can also be

a Servlet or JSP, an audio or video file, or almost anything else),

and the highest version number of the HTTP protocol that the browser

is prepared to speak (usually 1.0 or 1.1). If the URL does not

include a filename, the browser must send a /,

which translates to a request for the site's default

page. A simple request might look like this:





GET / HTTP/1.0
















Since the Web was invented on Unix, the Unix filename conventions are

normally used; hence the use of forward slashes for directory

separators.







Several

headers will usually follow this request line. These headers are in

the same format as email headers, that is, a keyword, a colon and

space, and a value. The headers must then be followed by a blank

line. If the request is a POST instead of a

GET, the request parameters and their values

follow this empty, or null, line.





One of the most important request headers is

User-Agent, which tells the server what kind of

browser you are using. This is used to generate statistics about how

many people use Mozilla/Netscape versus Internet Explorer, and it is

also used to customize response pages to handle bugs in (or

differences between) browsers. You can learn a lot about your clients

by watching this header; the BrowserHawk product from http://www.cyscape.com makes heavy use of

this particular header and displays quite a bit of useful information

about web browsers.









8.2.2 Response Codes and Headers





The response

line is also in three parts: the HTTP protocol number (echoing back

the HTTP protocol version number that was included in the client

request), a numeric status code, and a brief message. The status code

is a three-digit number indicating success, failure, or any one of

several other conditions. Codes beginning with

"2" mean success. Code 200 is the

most common success indicator and means that the requested file is

being served. Codes beginning with

"3" indicate a non-fatal error; one

of the most common is 302, which means a redirection. Redirections

were invented to allow server maintainers to provide a new location

for a file that has been moved. However, if you

don't give a filename, or if you type a URL with no

trailing slash (such as http://www.oreilly.com or http://www.oreilly.com/catalog), you will get

a redirection from most servers, depending on the

server's configuration. The server redirects the

client to the directory requested, and then to a default file within

that directory (if present). The redirection is necessary for

relative links to work; it is otherwise harmless but causes a brief

delay because the browser has to turn around and request the page

from the new location.

















Tomcat 4.0 and 4.1 will also send a redirection if you request a URL

ending in /; it will redirect your browser to the

default page for the relevant directory, meaning that

users' bookmarks will refer to the default page

instead of just the directory. This is Bug 11470 in the Tomcat bug

database and will probably not be fixed until Tomcat 5.0.







There are also error codes: status codes beginning with a

"4" indicate client errors, and

errors beginning with a "5"

indicate server errors. The most common error codes are good old 404,

when a requested file is not found, and 500, the

"catch-all" server error code.





Moving on

from response codes, an important response header is

Content-Type, which specifies the MIME type of the

response. text/html is the most common; see your

$CATALINA_HOME/conf/web.xml file for information

on others. This header's value tells the browser how

to interpret the response data, indicating whether the response is

text, an image file, an audio clip, or any other particular data

format. The browser will use this header in determining whether it

can display the response, or whether it needs to launch another

helper application.





If redirection occurs, there is another important response header:

Location. This header contains the full URL of the

location fielding the request. This location is the new location, not

the originally requested one. There are also several other headers

for cookies, locales, and more.









8.2.3 Interacting with HTTP





Since

we are dealing with a purely textual request and response phase (at

least where HTML is involved), it is possible to listen in on

client-server communication using a Telnet client. Unix systems

provide a command-line Telnet client that is ideal for this purpose,

and the Cygwin package includes a command-line Telnet client for

Windows. You can also use the netcat

(nc) program[2]

to view these requests noninteractively.



[2] netcat doesn't come with

Solaris 8, but you can get it from the SunFreeware site. Go to

http://www.sunfreeware.com, get

the nc package, and install it. For Windows, the

nc program comes with Cygwin.





Examples Example 8-3, Example 8-4,

and Example 8-5 show several simple HTTP

interactions with various web servers. In each case, the default page

is requested. Examples Example 8-3 and Example 8-4 show Tomcat HTTP requests being made with a

Telnet client, while Example 8-5 demonstrates the

use of netcat.

















In these examples, lines beginning with # are

comment lines; lines beginning with $ are commands

that we typed to start programs.







Note that the title tag for a 302 (redirection)

response contains the text "Tomcat Error

Report", which is a little misleading: this is not

an error, but a warning. However, in normal use the browser

doesn't display this text, so the message is

harmless.







Example 8-3. A redirection on Tomcat using telnet


$ telnet localhost 80

Trying 127.0.0.1...

Connected to localhost.

Escape character is '^]'.

GET / HTTP/1.0



HTTP/1.1 302 Moved Temporarily

Content-Type: text/html

Date: Sat, 20 Oct 2001 15:21:35 GMT

Location: http://localhost:8080/index.html

Server: Apache Tomcat/4.0 (HTTP/1.1 Connector)

Connection: close



<html>

<head>

<title>Tomcat Error Report</title>

</head>

<body bgcolor="white">

<br><br>

<h1>HTTP Status 302 - Moved Temporarily</h1>

The requested resource (Moved Temporarily) has moved temporarily to a new location.

</body>

</html>

Connection closed by foreign host.






Example 8-4 shows a request for the

index.html file.







Example 8-4. Requesting index.html on Tomcat using telnet


$ telnet localhost 80

Trying 127.0.0.1...

Connected to localhost.

Escape character is '^]'.

GET /index.html HTTP/1.0



HTTP/1.1 200 OK

Content-Type: text/html

Content-Length: 2836

Date: Sat, 20 Oct 2001 15:33:00 GMT

Server: Apache Tomcat/4.0 (HTTP/1.1 Connector)

Last-Modified: Fri, 12 Oct 2001 22:36:50 GMT

ETag: "2836-1002926210000"



<HTML>

<HEAD>

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">

<META NAME="GENERATOR" CONTENT="The vi editor from Unix">

<META NAME="Author" CONTENT="Ian Darwin">

<TITLE>Ian Darwin's Webserver On The Road</TITLE>

<LINK REL="stylesheet" TYPE="text/css" HREF="/stylesheet.css" TITLE="Style">

</HEAD>

<BODY BGCOLOR="#c0d0e0">

<H1>Ian Darwin's Webserver On The Road</H1>

# Rest of the HTML not shown here...

</BODY></HTML>






Notice the 200 OK status message, the

Content-Length header, the

Last-Modified header, and the

Server header. Each has valuable information.

Content-Length is used when the server knows the

exact size of the file that it is sending in response to a request;

Last-Modified lets the client know the last time

that the requested file was modified; and Server

indicates what server software is responding to the request.





nc is a general-purpose program for connecting

to sockets. It is similar to a Telnet client, but it is easier to

script. Example 8-5 shows nc

connecting to Tomcat.







Example 8-5. Using nc to talk to Tomcat


$ (echo GET / HTTP/1.0; echo "") | nc localhost 80

HTTP/1.1 302 Moved Temporarily

Content-Type: text/html

Date: Sat, 20 Oct 2001 15:21:47 GMT

Location: http://localhost:8080/index.html

Server: Apache Tomcat/4.0 (HTTP/1.1 Connector)

Connection: close



<html>

<head>

<title>Tomcat Error Report</title>

</head>

<body bgcolor="white">

<br><br>

<h1>HTTP Status 302 - Moved Temporarily</h1>

The requested resource (Moved Temporarily) has moved temporarily to a new location.

</body>

</html>






You've now seen the basics of interacting with the

server from a browser's point of view. Of course,

the web browser concept was invented by Tim Berners-Lee to avoid

users having to perform this kind of interaction, but as an

administrator you should know what happens under the hood to better

understand both the web browser and web server, and to be able to

diagnose HTTP request and response problems.


















     

     


    No comments: