Saturday, November 7, 2009

Apache Web Server














Apache Web Server


Apache is a multiprocess web server that is supplied with the Solaris 9 distribution. It is used by the majority of web servers in the world to serve HTTP (insecure) and HTTPS (secure) content. Apache also performs a number of different tasks, including




  • Providing a Common Gateway Interface (CGI) to provide client access to server-side processes and applications. CGI applications can be written in C, C++, Perl, Bourne shell, or the language of your choice.




  • Supporting the hosting of multiple sites on a single server, where each site is associated with a unique fully qualified domain name. Thus, a single Solaris system in an ISP environment can host multiple web sites, such as www.java-support.com, www.paulwatters.com, and so forth using a single instance of Apache.




  • Securing the transmission of credit card details and other sensitive data by supporting the Secure Socket Layer (SSL). This allows for key-based encryption of the HTTP protocol (called HTTPS), with key sizes of up to 128 bits.




  • A full-featured proxy/cache server, which provides an extra level of protection for clients behind a firewall and also keeps a copy of the most commonly retrieved documents from the WWW.




  • Customized access, agent, and error logs, which can be used for marketing and reporting purposes.




The main Apache configuration file is httpd.conf, which contains three sections:




  • The global environment section, which sets key server information, such as the root directory for the Apache installation, and several process management settings, such as the number of concurrent requests permitted per server process.




  • The main server configuration section, which sets runtime parameters for the server, including the port on which the server listens, the server name, the root directory for the HTML documents and images that comprise the site, and the server authorization configuration (if required).




  • The virtual hosts configuration section, which configures the Apache server to run servers for multiple domains. Many of the configuration options that are set for the main server can also be customized for each of the virtual servers.




We will now examine the configuration options in each of these sections in detail.




Global Environment Configuration


The following options are commonly set in the global environment configuration section:


ServerType standalone
ServerRoot "/opt/apache1.3"
PidFile /opt/apache1.3/logs/httpd.pid
ScoreBoardFile /opt/apache1.3/logs/apache_status
Timeout 300
KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 15
MaxRequestsPerChild 0
LoadModule auth_module modules/mod_auth.so

The server configuration shown here does not run as a service of the Internet superdaemon (inetd); rather, Apache runs as a stand-alone daemon. This gives Apache more flexibility in its configuration, as well as better performance than running through inetd. Since Apache is able to service more than one client through a single process (using the KeepAlive facility), no production system should ever use the inetd mode.


The ServerRoot for the Apache installation is set to /opt/apache1.3 in this installation. All of the key files required by Apache are located below this directory root, such as the lock file, the scoreboard file, and the file that records the PID of the current Apache process.


Each of the clients that connect to the server has an expiry date, in the form of a timeout. In this configuration, the timeout is set to 300 seconds (5 minutes). This is the period of inactivity after which a client is deemed to have timed out. Requests are kept alive, with up to 100 requests. There is no limit to the number of requests per child process






Main Server Configuration


The options that are commonly set in the main server configuration are as follows:


Port 80
ServerAdmin paul@paulwatters.com
ServerName www.paulwatters.com
DocumentRoot "/opt/apache1.3/htdocs"
<Directory/>
Options FollowSymLinks
AllowOverride None
</Directory>
<Directory "/opt/apache1.3/htdocs">
Options Indexes FollowSymLinks MultiViews
AllowOverride None
Order allow,deny
Allow from all
</Directory>
UserDir "/opt/apache1.3/users/"
DirectoryIndex index.html
AccessFileName .htaccess
<Files .htaccess>
Order allow,deny
Deny from all
</Files>

The parameters in this section determine the main runtime characteristics of the Apache server. The first parameter is the port on which the Apache server will run. If the server is being executed by an unprivileged user, this port must be set at port 1024 and higher. However, if a privileged user like root is executing the process, any unreserved port may be used (you can check the services database, /etc/services, for ports allocated to specific services). By default, port 80 is used.


Next, some details about the server are entered, including the hostname of the system, which is to be displayed in all URLs, and a contact e-mail address for the server. This address is usually displayed on all error and CGI misconfiguration pages. The root directory for all HTML and other content for the web site must also be supplied. This allows for both absolute and relative URLs to be constructed and interpreted by the server. In this case, the htdocs subdirectory underneath the main Apache directory is used. Thus, the file index.html in this directory will be the default page displayed when no specific page is specified in the URL. There are several options that can be specified for the htdocs directory, including whether or not to ignore symbolic links to directories that do not reside underneath the htdocs subdirectory. This is useful when you have files available on CD-ROMs and other file systems that do not need to be copied onto a hard drive, but simply to be served through the WWW.


Apache has a simple user authentication system available, which is similar to the Solaris password database (/etc/passwd) in that it makes use of encrypted passwords, but it does not make use of the Solaris password database. This means that a separate list of users and passwords must be maintained. Thus, when a password-protected page is requested by a user, a username and matching password must be entered using a dialog box.






Tip 

Any directory that appears underneath the main htdocs directory can be password protected using this mechanism.



Next, the various MIME types that can be processed by the server are defined, in a separate file called mime.types. Let’s look at some examples of the MIME types defined for the server:


application/mac-binhex40        hqx
application/msword doc
application/x-csh csh

We can see the file types defined here for many popular applications, including compression utilities (Macintosh BinHex, application/mac-binhex40, with the extension hqx), word processing documents (Microsoft Word, application/msword, with the extension doc), and C shell scripts (application/x-csh, with the extension csh).


The next section deals with log file formats, as shown here:


HostnameLookups Off
ErrorLog /opt/apache1.3/logs/error.log
LogLevel warn
LogFormat "%h %l %u %t \"%r\" %>>s %b \"%{Referer}I
\" \"%{User-Agent}i\"" combined
LogFormat "%h %l %u %t \"%r\" %>>s %b" common
LogFormat "%{Referer}i -> %U" referrer
LogFormat "%{User-agent}i" agent
CustomLog /opt/apache1.3/logs/access.log common
CustomLog /opt/apache1.3/logs/access.log combined

The first directive switches off hostname lookups on clients before logging their activity. Since performing a reverse DNS lookup on every client making a connection is a CPU- and bandwidth-intensive task, many sites prefer to switch it off. However, if you need to gather marketing statistics on where your clients are connecting from (for example, by geographical region or by second-level domain type), you may need to switch hostname lookups on. In addition, an error log is specified as a separate entity from the access log. A typical set of access log entries looks like this:



192.64.32.12 - - [06/Jan/2002:20:55:36 +1000]
"GET /cgi-bin/printenv HTTP/1.1" 200 1024
192.64.32.12 - - [06/Jan/2002:20:56:07 +1000]
"GET /cgi-bin/Search.cgi?term=solaris&type=simple HTTP/1.1" 200 85527
192.64.32.12 - - [06/Jan/2002:20:58:44 +1000]
"GET /index.html HTTP/1.1" 200 94151
192.64.32.12 - - [06/Jan/2002:20:59:58 +1000]
"GET /pdf/secret.pdf HTTP/1.1" 403 29


The first example shows that the client 192.64.32.12 accessed the CGI application printenv on January 6th, 2002, at 8:55 P.M. The result code for the transaction was 200, which indicates a successful transfer. The printenv script comes standard with Apache, and displays the current environment variables being passed from the client. The output is very useful for debugging, and looks like this:



DOCUMENT_ROOT="/usr/local/apache-1.3.12/htdocs"
GATEWAY_INTERFACE="CGI/1.1"
HTTP_ACCEPT="image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
application/vnd.ms-excel, application/msword, application/vnd.
ms-powerpoint, */*" HTTP_ACCEPT_ENCODING="gzip, deflate"
HTTP_ACCEPT_LANGUAGE="en-au"
HTTP_CONNECTION="Keep-Alive"
HTTP_HOST="www"
HTTP_USER_AGENT="Mozilla/4.75 (X11; I; SunOS 5.9 i86pc; Nav)"
PATH="/usr/sbin:/usr/bin:/bin:/usr/ucb:/usr/local/bin:
/usr/openwin/bin:/usr/dt/bin:/usr/ccs/bin"
QUERY_STRING=""
REMOTE_ADDR="209.67.50.55"
REMOTE_PORT="3399"
REQUEST_METHOD="GET"
REQUEST_URI="/cgi-bin/printenv"
SCRIPT_FILENAME="/usr/local/apache/cgi-bin/printenv"
SCRIPT_NAME="/cgi-bin/printenv"
SERVER_ADDR="209.67.50.203"
SERVER_ADMIN="paul@paulwatters.com"
SERVER_NAME="www.paulwatters.com"
SERVER_PORT="80"
SERVER_PROTOCOL="HTTP/1.1"
SERVER_SIGNATURE="Apache/1.3.12 Server at www.paulwatters.com Port 80\n"
SERVER_SOFTWARE="Apache/1.3.12 (Unix)" TZ="Australia/NSW"


The second example from the log shows that a client running from the same system successfully executed the CGI program Search.cgi, passing two GET parameters: a search term of ‘solaris’ and a search type of ‘simple.’ The size of the generated response page was 85,527 bytes. The third example shows a plain HTML page being successfully retrieved, with a response code of 200 and a file size of 94,151 bytes.


The fourth example demonstrates one of the many HTTP error codes being returned, instead of the 200 success code. In this case, a request to retrieve the file /pdf/secret.pdf is denied with a 403 code being returned to the browser. This code would be returned if the file permissions set on the /pdf/secret.pdf file did not grant read access to the user executing Apache (for example, nobody).






Virtual Hosts Configuration


The following options are commonly set in the main server configuration section:



<VirtualHost www.cassowary.net>
ServerAdmin webmaster@paulwatters.com
DocumentRoot /opt/apache1.3/htdocs/www.cassowary.net
ServerName www.cassowary.net
ErrorLog /opt/apache1.3/logs/www.cassowary.net-error_log
CustomLog /opt/apache1.3/logs/www.cassowary.net-access_log common
</VirtualHost>


Here, we define a single virtual host (called www.cassowary.net), in addition to the default host for the Apache web server. Virtual host support allows administrators to keep separate logs for errors and access, as well as a completely separate document root to the default server. This makes it very easy to maintain multiple virtual servers on a single physical machine.






Starting Apache


Apache is bundled with a control script (apachectl) that can be used to start, stop, and report on the status of the server. To obtain help on the apachectl script, the following command is used:



$ /opt/apache1.3/apachectl help
usage: /opt/apache1.3/apachectl (start|stop|restart|fullstatus|
status|graceful|configtest|help)

start - start httpd
stop - stop httpd
restart - restart httpd if running by sending a SIGHUP or start if
not running
fullstatus - dump a full status screen; requires lynx and mod_status
enabled
status - dump a short status screen; requires lynx and mod_status
enabled
graceful - do a graceful restart by sending a SIGUSR1 or start if
not running
configtest - do a configuration syntax test
help - this screen


To start Apache, you simply need to issue the following command from the same directory:


$ /opt/apache1.3/apachectl start

In order to stop the service, the following command may be used from the same directory:


$ /opt/apache1.3/apachectl stop

If you change the Apache configuration file and you need to restart the service so that the server is updated with the changes, you can simply use the following command from the same directory:


$ /opt/apache1.3/apachectl restart

Once Apache is running on port 80, clients will be able to begin requesting HTML pages and other content. However, in recent times, Apache has grown to be more than a simple web server. When answering client requests, a HTTP status code is returned from the server, as shown in Table 37-1.







































































































Table 37-1: HTTP Response Codes

Code Type



Code



Description



Successful Transmission



200



OK


 

201



Created


 

202



Accepted


 

203



Non-Authoritative Information


 

204



No Content


 

205



Reset Content


 

206



Partial Content



Client Errors



400



Bad Request


 

401



Unauthorized


 

402



Payment Required


 

403



Forbidden


 

404



Not Found


 

405



Method Not Allowed


 

406



Not Acceptable


 

407



Proxy Authentication Required


 

408



Request Timeout


 

409



Conflict


 

410



Gone


 

411



Length Required


 

412



Precondition Failed


 

413



Request Entity Too Large


 

414



Request-URI Too Long


 

415



Unsupported Media Type


 

416



Expectation Failed



Server Errors



500



Internal Server Error


 

501



Not Implemented


 

502



Bad Gateway


 

503



Service Unavailable


 

504



Gateway Timeout


 

505



HTTP Version Not Supported














No comments: