Friday, November 13, 2009

Reversing Parser Code









































Prev don't be afraid of buying books Next






























Reversing Parser Code



A parser breaks apart a raw string of bytes into
individual words and statements. This activity is called parsing. Standard parsing usually
requires "punctuation" characters, often called meta-characters because they have
special meaning. Many times, target software will parse through an
input string looking for these special characters.



Meta-characters are often points of interest for
an attacker. Many times important decisions rely directly on the
presence of these special characters. Filters also tend to rely on
meta-characters for proper operation.



Meta-characters are often quite easy to spot in
a dead listing. Spotting them can be as simple as looking for code
that compares a byte value against a hard-coded character. Use an
ASCII chart to determine the hex values for a given character.



In the IDA screen shot shown in Figure 6-9, we can see two
locations where data are being compared with the forward slash and
back slash characters—2F and 5C, which map to / and \
respectively. These kinds of comparisons tend to crop up in file
system filters, and thus make interesting starting places for an
attack.







Figure 6-9. An IDA disassembly of a
common FTP server showing the comparison for slash characters 2F
and 5C.





[View full size
image]





















Character Conversion



Character conversions sometimes occur as a
system prepares itself to make an API call. For example, although a
system call may expect a file system path to be supplied using
forward slashes, the program may accept both back slashes and
forward slashes to mean the "same thing." So, the software coverts
back slashes to forward slashes before making the call. This kind
of transformation results in equivalent characters. It doesn't
matter which kinds of slashes you supply, they will be treated as
forward slashes to the system call.



Why is this important? Consider what happens if
the programmer wants to make sure the user can't supply slashes in
a filename. This might be the case when the programmer is trying to
prevent a relative path traversal bug, for example. The programmer
may filter out forward slashes and believe that the problem is
solved. But if an attacker can insert a back slash, then the
problem may not have been properly handled. In situations in which
characters are converted, an excellent opportunity exists to evade
simple filters and IDSs. Figure 6-10 shows code that converts back
slashes to forward slashes.







Figure 6-10. The code here is using an
API call strchr to find character 5Ch (\) in a
string. Once the character is found, the code uses mov byte ptr
[eax], 2Fh
to replace the back slash with character 2Fh
(/)
. This loops until no more back slashes are found (via the
test eax, eax and subsequent jnz, which jumps [if
not zero] back to the beginning of the loop).



[View full size
image]





















Byte Operations



Parsers built into most programs usually deal
with single characters. A single character is generally encoded as
a single byte (the clear exception to this rule being
multibyte/unicode characters). Because characters are usually represented as bytes, identifying
single-byte operations in a reverse assembly is a reasonable
undertaking. Single-byte operations are easy to spot because they
use the notation "al," "bl," and so forth. Most registers today are
32 bits in size. This notation indicates that operations are being
performed on the lowest 8 bits of the register—a single
byte.



There is a classic "gotcha" here to keep in mind
when debugging a running program. Remember that only a single byte is being used with notations
like al and bl, regardless of what exists in the rest of the
register. If the register has the value 0x0011222F (as
shown in Figure 6-11), and
the byte notation is being used, the actual value processed is
0x2F, the lowest 8 bits.







Figure 6-11. A single byte (2F) as
represented in a 32-bit register.


















Pointer Operations



Strings are often too large to be stored in a
register. Because of this, a register will usually contain the
address of the string in memory. This is called a pointer. Note that pointers are
addresses that can point to almost anything, not just strings. One
nice trick is to find pointers that increment by a single byte, or
operations that use a pointer to load a single byte.



Byte operations with pointers are easy to spot.
Pointer operations follow the [XXX] notation (for example, [eax],
[ebx], and so on) in combination with the al, bl, cl, and so forth,
notation.



Pointer arithmetic has the notation







[eax + 1], [ebx + 1], etc.






Moving bytes around in memory ends up looking
something like this:







mov dl, [eax+1]






In some cases, the register where the pointer is
stored is modified directly, like this:





























inc eax








NULL Terminators



Because strings are typically NULL terminated
(especially when C is being used), looking for code that compares
with a 0 byte can also be useful. Tests for the NULL character tend
to look something like this:







test al, al
test cl, cl






and so forth.




Figure
6-12 includes several single-byte operations:








  • cl, byte notation








  • [eax], a pointer








  • inc eax, increment pointer








  • test cl, cl, looking for
    NULL








  • [eax+1], pointer + 1 byte








  • mov dl, [eax+1], moving a
    single byte











Figure 6-12. Code with several
interesting 1-byte operations included.



[View full size
image]




















These operations may indicate that the
program is parsing or otherwise processing input.















































Amazon






1 comment:

Sara Reid said...

The parser is hand-written, and I didn't really have a design when I set out writing it, so it scares people. To date, only Evan and Colin have contributed changes that I know of. The code, to be frank, is a hedge of thorns.

LMX 2.0 will still be hand-written, but this time I've done some design up front. Unless any major design issues come up while I write the code, the new code will be much cleaner and easier to follow.

Totally winging it has its moments, but LMX 1.0 is definitely a counterexample.

All that said, however, what matters most to me is the experience to the user, and LMX 1.0 certainly provides a better experience to the user for big messages (no backtracking = faster consumption = chat window on the screen quicker). I accept that trade-off, especially knowing that the trade-off will mostly go away in LMX 2.0.

creatine