Saturday, November 21, 2009

Introduction

 
 
  

 


 


Security and Cryptography Security Software Engineering Internet/Online Mike Andrews James A. Whittaker Addison-Wesley Professional How to Break Web Software: Functional and Security Testing of Web Applications and Web Services

Introduction


Many server-side components are typical programs written in languages like C and C++. Although the testing issues involving these types of components from the server side are covered in previous books (How to Break Software and How to Break Software Security), we cover the issues of testing these components from the Web client. However, for completeness, you may want to refer to these earlier books.


From the perspective of the Web client, the main concern is server-side programs that are susceptible to attack. It turns out that certain characteristics of server-side programs—namely the language they are written in—can make them prone to attacks. Thus, we call these language-based attacks and focus on three specific attacks that are common and cannot be overlooked during development: buffer overflows, canonicalization, and NULL strings.



Attack 14 Buffer Overflows


Perhaps the most notorious security attack against applications is the buffer overflow. Buffer overflows were first identified as a potential problem way back in the 1970s. Using a buffer overflow as the delivery mechanism for worms and malicious code has been a hot area for exploitation. CodeRed, Nimda, Slammer, and Blaster are some of the worms that resulted from server components overrunning memory buffers. The list is long, and their effects have been expensive.


Our interest is in ensuring that any components that are susceptible to overruns do not cause our own Web applications to be compromised. Thus, we need to understand what server-side components our Web applications depend upon and ensure that these components cannot be remotely exploited.


Buffer overflows occur when a function in a program fails to check the size of the input data that it is processing. If this input data is larger than the space allocated for it, it overflows into other memory locations on the execution stack. Thus, some memory locations that are intended for other purposes get overwritten (corrupted) with this input data. More often than not, that corrupted data causes the software to crash.


The most dangerous situation is for the input data to overflow into memory that will be used in choosing which instruction to execute next. When data overflows into a memory location called the return address, part of that data actually becomes an instruction to the computer. That's the magic of a buffer overflow: User input data actually causes the execution sequence of the machine to change, allowing an attacker to run arbitrary code on our Web server. We must prevent this situation.


There has been so much written about buffer overflows that including more than this brief introduction would be redundant, especially because most of the languages that modern Web applications are written in are more resilient to this kind of attack. If you are interested in the underlying details, the seminal paper is "Smashing the Stack for Fun and Profit" (http://www.insecure.org/stf/smashstack.txt). Also see "19 Deadly Sins of Software Security," by Michael Howard, David LeBlanc, and John Viega (McGraw-Hill, 2005). It lists buffer overflows as sin number one!





When to Apply This Attack


Not all Web applications are vulnerable to buffer overflow attacks. That's good news, because Web applications are deployed in one of the most risky places: network-facing code that accepts data from anonymous users.


The prevalence of buffer overflow attacks and research into preventing them has meant that in a lot of programming languages, protection measures are provided for developers to use. This means that the majority of Web applications, and in some respects parts of the operating systems that execute them, are immune to this attack. However, don't get a false sense of security. There are still plenty of legacy components and careless developers.


We begin this attack by looking at the filenames of programs that make up the Web application. If they are Java servlets (/servlet/ is often in the URL path, or the filename ends in .jsp), .NET programs (.aspx), or PHP (.php), to name but a few, you may as well move along now because such managed code is immune[1] to most buffer overflow attacks. These environments carefully check memory and array access, resizing buffers where needed. The prime target for buffer overflow testing is native code, ending in extensions such as .exe, .dll, or .cgi (although one can never be sure with this last extension).

[1] Immune is perhaps too strong a word here, making the reader assume that buffer overflows are impossible to achieve in Web application software, which is completely untrue. Buffer overflows are much harder to find and exploit in current Web software, but they do exist. However, the environment and languages that are used to write today's Web applications protect against attack. If a buffer overflow does exist, it generally will be in a vendor's code (that is, Microsoft's, Sun's, or PHP's) rather than your own. Therefore, the best protection measure is to ensure that you are running the most current versions of the Web server and application environment.


If in doubt, give this attack a try. It's better to be safe than sorry.





How to Conduct This Attack


Of all the attacks we will present in this book, the buffer overflow attack is probably the easiest to conduct. The idea is to fill every input field with as much data as it will take, and then some! Look for parameters or form fields and fill them with lots of data. And when we say lots, we mean lots. It's not unusual for a buffer overflow to be uncovered only when more than 100,000 characters of data have been passed to it.[2]

[2] See http://www.securityfocus.com/archive/1/317142/2003-03-28/2003-04-03/0.


Some of the best places to look are where the developers have restricted the length or types of input that a user can enter, discussed in "Attack 4—Bypass Restrictions on Input Choices." Form fields with the MAXLENGTH attribute are a good hint because the developer knows he doesn't want too much data in that instance, but any field or parameter can be just as good a choice. Because a malicious user can easily remove the MAXLENGTH restriction, it is fair game for this attack, and client-side prevention is not enough.


How does one know when a buffer overflow attack has been successful? Well, the dangerous part of testing for this attack is that if successful, it often brings down the Web server. So be warned!


If the Web server is brought down, during subsequent request for pages, you'll likely get an error message saying that the server is unavailable. Overwriting the return address of a function to that of an unknown location produces an exception in the Web server which, if not handled, unceremoniously kills the server or program and in extreme cases the operating system, too. Therefore, you should schedule testing for buffer overflows at a time when you can tolerate disruption of service.


Proceed with extreme caution when applying this attack to a production system!


Another way of testing for buffer overflows is to generate code that won't crash the server but will produce a clear signal that the attack was successful, like a pop-up message box or a signal network packet sent to another machine. However, crafting exploit code is a highly technical skill, subject to various caveats and workarounds for different situations, and clearly out of scope for this book. If this interests you, however, The Shellcoder's Handbook by Jack Koziol et al, is a good place to start.


Identifying all the potential places to enter input and creating varying ranges of data can be quite an undertaking for anything more than a trivial Web application. Using automation for this kind of testing is a good approach, because the inputs are simple to generate, and identifying successful test cases is easy.


SPIKE Proxy is one such tool that you can use for automated buffer overflow testing of Web applications. You simply run the proxy, point the Web browser through it (as Paros does—see Appendix C, "Tools"), and walk over all the pages in the application that you want to test. Visiting the proxy page, SPIKE Proxy lists a number of tests to perform, as Figure 6-1 shows.



Figure 6-1. SPIKE Proxy's main page after walking a site.

[View full size image]




Spike then replays the requests made previously, fuzzing each of the parameters it discovered depending on the test selected—in our case, injecting various numbers of As.


Although SPIKE Proxy takes much of the time and tediousness out of repetitive testing, there are a few things to be aware of. First, in our experience, SPIKE Proxy generates a lot of false positives, which you can see in Figure 6-2. That's because SPIKE Proxy looks for limited keywords in the response from the server to identify successful tests. There are few good alternative ways of doing this; therefore, all results have to be validated by hand. This leads to another drawback: SPIKE Proxy doesn't log the requests and responses for successful tests so that end users can easily revalidate them by hand. The best way of working around this is to redirect the output of the proxy (the runme.bat file) to a file by using the tee utility (standard under *nix—you can easily find it for Windows) and entering the following:


c:\SPIKEProxy\runme.bat | tee –a logfile.txt



Figure 6-2. SPIKE Proxy during automated testing.

[View full size image]




This code sends output to both the screen and the log file. This allows you to see the SPIKE Proxy operating and how far along in its tests it is. Be aware, though; the log file can end up being very large on big sites!


Despite these complaints, SPIKE Proxy is a really useful tool. Even with the false positives, it's easy to tell if a test has been successful, because the Web server just stops responding.





How to Protect Against This Attack


As mentioned earlier, many programming languages already have protection measures built in for this attack. In addition, so much has been written about buffer overflows, how programmers accidentally create them, and how to avoid them that many resources are available to help. We point you to Michael Howard, a guru on this topic, who has written many articles about buffer overflows in various shapes and sizes.[3]

[3] See http://blogs.msdn.com/michael_howard/ and the Code Secure columns on MSDN at http://msdn.microsoft.com/security/securecode/columns/default.aspx.


Buffer overflows occur when the size of data being passed in is blindly copied into memory without checking its size. You can use two approaches to protect against this attack: knowing the size of data and allocating enough space accordingly, or terminating input at a sensible size and ignoring whatever additional data the user is trying to force upon your application.


The simplest approach is to truncate all input at a reasonable length. It's fine to truncate on the client for error locality (let the user know that the input is too large and some of it is going to be thrown away), but do it again on the server before the input is passed to memory.



Attack 15 Canonicalization


By now, you've probably realized that most of the attacks we've discussed so far can be mitigated by carefully validating the input that is received. This is the nature of a lot of testing.


However, filtering out bad input is not as easy as it sounds. In the modern world of computing, you have to deal with more than just ASCII characters. Because of the different representations available, there are various ways of encoding data, and you need to consider all of these. This is an issue known as canonicalization.


Canonicalization means ensuring that all data is represented in a standard, common form. If we don't perform this step when comparing or using data, we may not be looking at the actual data that will eventually be processed, so any validating that we do may miss an attack.


The first example of canonicalization is simple encoding of characters in their HTTP/HTML equivalent. In some cases, you need to encode certain characters because they have extended meaning in some contexts. For example, if you have a space character in data that is being sent to the server (or is received by the browser), you have to encode it as + to avoid an illegal break in the CGI parameters where spaces are not allowed. When the server or browser receives the + character, it converts it back into a space. (An exercise for the interested reader: How are + characters represented?)


This is a simple encoding that is carried out automatically inside the browser/server communication protocol. Unfortunately, we need to consider lots of other encoding issues, many of which occur at different levels of the communication protocol. Wherever there is a potential mismatch in the encoding and decoding of characters, an attacker could sneak data past your validation checks.


One common representation of characters that is used on the Web is the UTF-8 representation. It translates a 31-bit character set (like UNICODE) into an 8-bit representation to reduce the number of bytes transferred between computers. Because the most common characters used are still the standard ASCII characters, UTF-8 encodes them in the most simplistic way possible. ASCII characters 0 to 127 encode themselves 0x0 to 0x7f. (The 0xNN format signifies a hexadecimal number.) All other characters require more space, so UTF-8 allows encoding as multibyte sequences in the range of 0x80 to 0x7fffffff. For a full and complete discussion of UTF-8 encoding, see http://en.wikipedia.org/wiki/UTF-8 and http://www.unicode.org/standard/standard.html.


Now, this is where it gets interesting. When a Web server receives data, it has to decide how to decode it. With so many complicated representations, the server can sometimes get it wrong and not match what the programmer of the application was expecting to match against. In other words, the browser is working by one set of character representation rules, and the server is working on another.


The best example of this is the different representations that the / character can be, which is also one of the characters that is used in the directory traversal attack.


The standard representation of the / character in HTTP is the basic character itself—the ' character doesn't have "extra" meaning in another context. However, it can also be represented as %5c, which is the UTF-8 encoding of the character. (In ASCII, the value is 0x5c, so no extra conversion is required.) Most characters in UNICODE have multiple encodings, though, that could be used but shouldn't be because they are not the shortest possible. In our example of an overly long encoding, this would be %c0%af.


So now we have three potential encodings for a single character: /, %5c, and %c0%af. It was this illegal representation of / that was used against an IIS 4 or 5 Web server with the following request:


http://www.example.com/app/..%c0%af..%c0%af../winnt/
system32/cmd.exe?/c+dir


With this request the attacker was able to get the server's operating system to performed a dir command (cmd.exe /c dir), but it could have been a lot worse!


Microsoft attempted to patch this flaw by filtering for these characters. One would have thought that this would have been the end of the story, but resourceful attackers took a further step. Given an encoded character representation (%5c), you can encode it again. Let's pick the % character, encode it, and then add it back into the encoded representation of / (see Figure 6-3).



Figure 6-3. Double URL encoding.





When the Web server picked up the request, it knew it could do several things. It could be one of the higher (that is, non-ASCII) UTF-8 encoded characters, or it could decode the %25, find out that it's a % character, and then decode the %5c. It should have been the former, but due to a shift-reduce parsing error in the decoding engine, it preferred the latter, which was the opposite of the application code. Once again, the vulnerability manifested itself.





When to Apply This Attack


This attack is secondary to the majority of others listed in this book. Whenever you try an attack and it fails, the application may indicate that it is catching the input and validating it (error messages are produced). It is worth trying to circumvent the validation routine(s) while attempting to encode parts of the input.





How to Conduct This Attack


In any attack, certain characters give it a signature that is different from normal data. For example, in cross-site scripting (XSS) attacks, it may be the < and > characters. In SQL injection, it could be the single quote or double dash (--), although looking for these as an indication of attack is asking for trouble. Picking these characters and encoding them may result in the Web server and application performing different decoding and allowing the attack through.


A cheat sheet of the most common characters used in attacks and their different representations is shown in Table 6-1, but when there is a lot of data to encode, it's easier to use a tool.


Table 6.1. Common Characters Used in Attacks

Character

Used In

URL Encoded[4]

< and >

XSS

%3c and %3e

:

XSS—adding javascript: to existing tags

%3a

'

SQL injection

%27

--

SQL injection

%2D%2D

;

SQL and command injection

%3B

../

Directory traversal

%2E%2E%2F

`

Command injection

%60

/0 (null)

NULL strings

%00


[4] HTML encoding (in hex) has the same code as URL encoding, but it follows the format &#xNN. For example, < would be represented as &#3c;.


Napkin (see Figure 6-4; Napkin is available at http://www.0x90.org/releases/napkin/), is a simple encoder/decoder and hex display of whatever data you pass it. Currently it handles various types of conversions: Base64, URL (UTF), ROT (rotate), MD5 checksums, and SHA checksums. It currently does not perform double decode. You simply copy the output back into the input pane and run the encode/decode again.



Figure 6-4. Napkin—An encoding/decoding tool.

[View full size image]




Table 6-1 shows a quick cheat sheet for common characters that are used in Web application attacks.





How to Protect Against This Attack


Exploits using this attack occur when the Web server treats input differently from the application. These kinds of errors are discovered and fixed frequently, so keeping a Web server patched and up to date is a good first step. After that, unfortunately, we come back to the same old recommendations of validating input. With canonicalization, though, we may have to go through an extra step depending on the encoding that has been done before it has reached the application.


For example, with PHP versions 4 and above, you can automatically prefix all single quotes, double quotes, backslashes, and NULL characters with a backslash to help mitigate attacks like SQL injection and NULL characters. (We'll get to this attack next.)


Therefore, to do any like-to-like comparisons, the programmer has to call stripslashes first. Similarly, in a comparison between data passed through the URL and strings on the server, the programmer may have to call urldecode. However, calling urldecode is dangerous on parameters because the Web server should have already decoded it by the time it reaches the application. Calling urldecode again opens the application to the double-decode attack discussed earlier. The key is understanding what the browser and the Web server encode and decode before it reaches the application and then treating it accordingly.



Attack 16 NULL-String Attacks


It's rare for any program to exist in isolation. Pretty much all software relies on other software to help it perform its tasks. From compilers to libraries and even microprocessor code, data filters its way down through various levels of application software, libraries, and the operating system.


However, these different layers often treat data differently. The languages used to write Web applications are usually high-level languages like ASP, PHP, and Java, but they lean on support from libraries of prewritten code, often in lower-level languages like C and C++. Just like in our canonicalization attack, we can encode characters that have different meaning in different environments. The character in question for this attack is the NULL character, which can be represented as either \0 or %00.


In low-level languages like C, the NULL character signifies the end of a string, whereas in higher-level languages like PHP, its use is unnecessary because all end-of-string syntax is handled automatically.


If we have two different ways of handling the same character, we can start to see cracks in data validation mechanisms. For example, say that we are filtering for XSS by looking for <script> tags, and we know that to actually do the string comparison, our higher-level language uses library functions that are written in a lower-level language. By passing <%00string>, we can fool the library function into not matching the malicious string, because as far as it's concerned, the string ended after the < character. The higher-level language, however, sees the NULL character as nothing, parses it from the data, and writes <script> to the browser.





When to Apply This Attack


Let's use a simple example to demonstrate the point and to give a nod to one of the pioneers of Web application security testing, RainForrestPuppy,[5] whose example this is.

[5] You can find RFP's Web site at http://www.wiretrip.net/rfp/ until he comes out of retirement.


What if we had a script that allowed people to change passwords unless the user they were trying to change was root, the administrative user? Here's the Perl code that we would be expecting to see:


if ($username ne "root"){
# change the password
# do whatever's needed in here
# - chpasswd, passwd, update database, etc
}
else{
#do nothing and print error message
die ("You cannot change the root password");
}


If the user tries to change the password with the username parameter equal to root, Perl makes the match and the script exits with an error message. However, if the user tries root/0, Perl does not match the string and executes the password change. If Perl filters that change to lower-level functions (like the change password shell command or the kernel function), the NULL is effectively dropped, leaving just root.


This leads to where you should look to apply this attack. The answer is quite simple. Where you have been performing other attacks (SQL injection, XSS, directory traversal, and so on), and you have been thwarted because of possible input filtering, try adding a NULL character at various places in the input, such as at the end for strings where a direct match is being looked for, or in the middle where strings are being concatenated together and you want stop filtering at a certain point.





How to Conduct This Attack


All this attack entails is putting NULL characters in strings to attempt to overcome filtering. Always try to use the clean attack first (there's no point in making things more complicated than they need to be), but if that fails and you get an error message indicating that some filtering may be going on, try adding NULL characters at different points, like the beginning or the end of the string. Remember: The NULL character can be represented in two ways: the C syntax of \0, or the encoded variety of %00.





How to Protect Against This Attack


The simplest way of protecting against this type of attack is ensuring that you use the same programming language throughout the application, including any other code that it may rely on. This is much more difficult than it sounds, though. You seldom have control of all the application code unless you write it yourself, which is unreasonable. (Why go reinventing the wheel?)


A much simpler approach is to look for and remove NULL characters at the first opportunity you get to avoid them being misrepresented elsewhere in the application. Be aware, though, of the canonicalization issues presented earlier, and the <scr<script>ipt> find and replace trick with double slashes (\\00).


Another protection measure that Perl uses is a feature called taint. When operating in this mode, Perl will not send any user input to certain functions (namely, open(), unlink(), rename(), exec(), system()) until it has validated. How does Perl validate them? Does it have some magic validation scheme? Unfortunately, no. Variables become untainted when they pass through a regular expression and groups are parsed out of them. Consider the following:


#tainted input
$email = $form_data{"email"};
# warning – cannot pass tainted inputs to system
calls
system("sendmail", $email);
# ok to pass tainted input to some functions!
print($email);
# parse out (and untaint) some of the input
if($email=~ /(\w{1}[w-.]*\ @-.]*)\@([\w-.]+)/){
$name = "$1"; # name is now "untainted as it came
from a regex group
}
# name becomes tainted again because it is
concatenated with unvalidated input
$name = $name." from ".$formdata{"state");


It is up to the developer to perform correct validation checking, but taking this extra effort does help ensure that inputs are not accidentally passed off to other systems (and thus languages that have alternative representations/encoding schemes) without the developer being aware of it.


     
     
      

     


     


     


    No comments: