Wednesday, November 25, 2009

Tour of an Exploit









































Prev don't be afraid of buying books Next






























Tour of an Exploit



What happens when a software program is
attacked? We introduce a simple house analogy to guide you through
a software exploit. The "rooms" in our target software correspond
to blocks of code in the software that perform some function. The
job at hand is to understand enough about the rooms to wander
through the house at will.



Each block of code (room) serves a unique
purpose to the program. Some code blocks read data from the
network. If these blocks are rooms in a house and the attacker is
standing outside the door on the porch, then networking code can be
thought of as the foyer. Such network code will be the first code
to examine and respond to a remote attacker's input. In most cases,
the network code merely accepts input and packages it into a data
stream. This stream is then passed deeper into the house to more
complex code segments that parse the data. So the (network code)
foyer is connected by internal doorways to adjacent, more complex
rooms. In the foyer, not much of interest to our attack can be
accomplished, but directly connected to the foyer is a kitchen with
many appliances. We like the kitchen, because the kitchen can, for
example, open files and query databases. The attacker's goal is to
find a path through the foyer into the kitchen.





The Attacker's Viewpoint



An attack starts with breaking rules and
undermining assumptions. One of the key assumptions to test is the
"implicit trust" assumption. Attackers will always break any rule
relating to when, where, and what is "allowed" to be submitted as
input. For the same reasons that software blueprints are rarely
made, software is only rarely subjected to extensive "stress
testing," especially stress testing that involves purposefully
presenting malicious input. The upshot is that users are, for
reasons of inherent laziness, trusted by default. An implicitly
trusted user is trusted to supply correctly formed data that play
by the rules and are thus also implicitly "trusted."



To make this clearer, we'll restate what's going
on. The base assumption we'll work against is that trusted users
will not supply "malformed" or "malicious" data! One particular
form of this trust involves client software. If client software is
written to send only certain commands, implicit assumptions are
often made by the architects that a reasonable user will only use
the client software to access the server. The issue that goes un
noticed is that attackers usually write software. Clever attackers
can write their own client software or hack up an existing client.
An attacker can (and will) craft custom client software
capable of delivering malformed input on
purpose
and at just the right
time
. This is how the fabric of trust unravels.





Why Trusting Users Is Bad



We now present a trivial example that shows how
implicitly trusting a client unravels. Our example involves the
maxsize attribute of a Hypertext Markup Language (HTML)
form. Forms are a common way of querying users on a Web site for
data. They are used extensively in almost every type of Web-based
transaction. Unfortunately, most Web forms expect to receive proper
input.



The developer who constructs a form has the
ability to specify the maximum number of characters that a user is
allowed to submit. For example, the following code limits the
"username" field to ten characters:







<form action="login.cgi" method=GET>
<input maxlength=10 type="input" name="username">Username</input>
</form>






A designer who misunderstands the underlying
technology might assume that a remote user is limited to submitting
only ten characters in the name field. What they might not realize
is that the enforcement of field length takes place on the remote
user's machine, within the user's Web browser itself! The problem
is that the remote user might have a Web browser that doesn't pay
attention to the size restriction. Or the remote user might build a
malicious browser that has this property (if they are an attacker).
Or better yet, the remote user might not use a Web browser at all.
A remote user can just submit the form request manually in a
specially crafted uniform resource locator (URL):




http://victim/login.cgi?username=billthecat



In any case, the remote user should most
definitely not be trusted, and neither should the remote user's
software! There is absolutely nothing that prevents the remote user
from submitting a URL such as




http://victim/login.cgi?username=THIS_IS_WAY_TOO_LONG_FOR_A_USERNAME



Assumptions involving trust, like the one
presented here, make up secret doorways between rooms in the house
of logic. A clever user can use the "implicit trust" doorway to
sneak right through the foyer and into the kitchen.





Like a Lock Pick



An attacker must carefully craft attack input as
data to be presented in a particular order. Each bit of data in the
attack is like a key that opens a code path door. The complete
attack is like a set of keys that unlocks the internal code paths
of the program, one door at a time. Note that this set of keys must
be used in the precise order that they appear on the key chain. And
once a key has been used, it must be discarded. In other words, an
attack must include presenting exactly the right data in exactly
the right order. In this way, exploiting software is like picking
locks.



Software is a matrix of decisions. The decisions
translate into branches that connect blocks of code to one another.
Think of these branches as the doorways that connect rooms. Doors
will open if the attacker has placed the right data (the key) in
the right order (location on the key chain).



Some of the code locations in the program make
branching decisions based on user-supplied data. This is where you
can try a key. Although finding these code locations can be very
time-consuming, in some cases the process can be automated. Figure 2-2 diagrams the code
branches of a common File Transfer Protocol (FTP) server. The graph
indicates which branches are based on user-supplied data.







Figure 2-2. This graph illustrates the
branching logic of a common FTP server. Blocks indicate continuous
code and lines indicate jumps and conditional branches between code
blocks. Blocks outlined in bold indicate that user-supplied data
are being processed.
















Graphing of the sort shown in Figure 2-2 is a powerful tool when reverse
engineering software. However, sometimes a more sophisticated view
is needed. Figure 2-3
shows a more sophisticated three-dimensional graph that also
illuminates program structure.







Figure 2-3. This graph is rendered in
three dimensions. Each code location looks like a small room. We
used the OpenGL package to illustrate all the code paths leading
toward a vulnerable sprintf call in a target program.



[View full size
image]



















Inside particular program rooms, different parts
of a user's request are processed. Debugging tools can help you to
determine what sort of processing is being done where. Figure 2-4 shows a
disassembly of a single code location from a target program. Going
by our analogy, this code appears in a single room in the house
(one of the many boxes shown in the earlier figures). The attacker
can use information like this to shape an attack, room by room.







Figure 2-4. Disassembly of one "room" in
the target program. The code at the top of the listing is a set of
program instructions. The instructions that deal with user-supplied
data are called out at the bottom of the listing. Exploiting
software usually involves understanding both how data flow in a
program (especially user data) and how data are processed in given
code blocks.



[View full size
image]





















A Simple Example



Consider an exploit in which the attacker
executes a shell command on the target system. The particular
software bug responsible for causing the vulnerability might be a
code snippet like this:







$username = ARGV; #user-supplied data
system("cat /logs/$username" . ".log");






Note that the call to the system()
function takes a parameter that is un checked. Assume, for this
example, that the username parameter is delivered from an HTTP
cookie. The HTTP cookie is a small data file that is controlled
entirely by the remote user (and is typically stored in a Web
browser). Software security-savvy developers know that a cookie is
something that should never be
trusted (unless you can cryptographically protect and verify
it).



The vulnerability we exploit in this example
arises because untrusted cookie data are being passed into and used
in a shell command. In most systems, shell commands have some level
of system-level access, and if a clever attacker supplies just the
right sequence of characters as the "username," the attacker can
issue commands that control the system.



Let's examine this in a bit more detail. If the
remote user types in the string bracken, corresponding to
a name, then the resulting command sent through the
system() call of our code snippet will be







cat /logs/bracken.log






This shell command displays the contents of the
file bracken.log in the directory/logs in the Web browser. If the
remote user supplies a different username, such as
nosuchuser, the resulting command will be







cat /logs/nosuchuser.log






If the file nosuchuser.log does not exist, a
minor "error" occurs and is reported. No other data are displayed.
From the perspective of an attacker, causing a minor error like
this is no big deal, but it does give us an idea. Because we
control the username variable, we can insert whatever characters we
choose as the username we supply. The shell command is fairly
complex and it understands lots of complex character sequences. We
can take advantage of this fact to have some fun.



Let's explore what happens when we supply just
the right characters in just the right order. Consider the
funny-sounding username "../etc/passwd." This results in the
following command being run for us:







cat /logs/../etc/passwd.log






We are using a classic directory redirection
trick to display the file /etc/passwd.log. So as an attacker, we
wield complete control of the filename that is being passed to the
cat command. Too bad there isn't a file called
/etc/passwd.log on most UNIX systems!



Our exploit so far is pretty simple and isn't
getting us very far. With a little more cleverness, we can add
another command to the mix. Because we can control the contents of
the command string after cat ..., we can use a trick to
add a new command to the mix.



Consider a devious username, such as "bracken;
rm –rf /; cat blah," which results in three commands being
run, one after the other. The second command comes after the first
";" and the third after the second ";":







cat /logs/bracken; rm –rf /; cat blah.log






With this simple attack we're using the
multiple-command trick to remove all the files recursively from the
root directory / (and making the system "just do it" and not ask us
any Macintosh-like questions). After we do this, the unfortunate
victim will be left with a root directory and perhaps a
lost-and-found directory at most. That's some pretty serious damage
that can be inflicted simply as the result of one single username
vulnerability on a broken Web site!



It's very important to notice that we chose the
value of the username in an intelligent fashion so that the final
command string will be formatted correctly and the embedded
malicious commands will be properly executed. Because the ";"
character is used to separate multiple commands to the system (a
UNIX box), we're actually doing three commands here. But this
attack isn't all that smart! The
final part of the command that runs cat blah.log is
unlikely to be successful! We deleted all the files!



So all in all, this simple attack is about
controlling strings of data and leveraging system-level language
syntax.




Of course our example attack is trivial,
but it shows what can result when the target software is capable of
running commands on a system that are supplied from an untrusted
source. Stated in terms of the house analogy, there was an
overlooked door that allows a malicious user to control which
commands the program ends up executing.



In this kind of attack we're only exercising
preexisting capabilities built right into the target. As we will
see, there are far more powerful attacks that completely bypass the
capabilities of the target software using injected code (and even
viruses). As an example, consider buffer overflow attacks that are
so powerful that they, in some sense, blast new doorways into the
house of logic entirely, breaking down the control flow walls with
a giant sledgehammer and chain saw. What we're trying to say here
is that there exist direct attacks on the very structure of a
program, and sometimes these attacks rely on fairly deep knowledge
about how the house is built to begin with. Sometimes the knowledge
required includes machine language and microchip architecture. Of
course, attacks like this are a bit more complicated than the
simple one we showed you here.















































Amazon






No comments: