Reversing Parser Code
A parser breaks apart a raw string of bytes into individual words and statements. This activity is called parsing. Standard parsing usually requires "punctuation" characters, often called meta-characters because they have special meaning. Many times, target software will parse through an input string looking for these special characters.
Meta-characters are often points of interest for an attacker. Many times important decisions rely directly on the presence of these special characters. Filters also tend to rely on meta-characters for proper operation.
Meta-characters are often quite easy to spot in a dead listing. Spotting them can be as simple as looking for code that compares a byte value against a hard-coded character. Use an ASCII chart to determine the hex values for a given character.
In the IDA screen shot shown in Figure 6-9, we can see two locations where data are being compared with the forward slash and back slash characters2F and 5C, which map to / and \ respectively. These kinds of comparisons tend to crop up in file system filters, and thus make interesting starting places for an attack.
[View full size image]
Character Conversion
Character conversions sometimes occur as a system prepares itself to make an API call. For example, although a system call may expect a file system path to be supplied using forward slashes, the program may accept both back slashes and forward slashes to mean the "same thing." So, the software coverts back slashes to forward slashes before making the call. This kind of transformation results in equivalent characters. It doesn't matter which kinds of slashes you supply, they will be treated as forward slashes to the system call.
Why is this important? Consider what happens if the programmer wants to make sure the user can't supply slashes in a filename. This might be the case when the programmer is trying to prevent a relative path traversal bug, for example. The programmer may filter out forward slashes and believe that the problem is solved. But if an attacker can insert a back slash, then the problem may not have been properly handled. In situations in which characters are converted, an excellent opportunity exists to evade simple filters and IDSs. Figure 6-10 shows code that converts back slashes to forward slashes.
[View full size image]
Byte Operations
Parsers built into most programs usually deal with single characters. A single character is generally encoded as a single byte (the clear exception to this rule being multibyte/unicode characters). Because characters are usually represented as bytes, identifying single-byte operations in a reverse assembly is a reasonable undertaking. Single-byte operations are easy to spot because they use the notation "al," "bl," and so forth. Most registers today are 32 bits in size. This notation indicates that operations are being performed on the lowest 8 bits of the registera single byte.
There is a classic "gotcha" here to keep in mind when debugging a running program. Remember that only a single byte is being used with notations like al and bl, regardless of what exists in the rest of the register. If the register has the value 0x0011222F (as shown in Figure 6-11), and the byte notation is being used, the actual value processed is 0x2F, the lowest 8 bits.
Pointer Operations
Strings are often too large to be stored in a register. Because of this, a register will usually contain the address of the string in memory. This is called a pointer. Note that pointers are addresses that can point to almost anything, not just strings. One nice trick is to find pointers that increment by a single byte, or operations that use a pointer to load a single byte.
Byte operations with pointers are easy to spot. Pointer operations follow the [XXX] notation (for example, [eax], [ebx], and so on) in combination with the al, bl, cl, and so forth, notation.
Pointer arithmetic has the notation
[eax + 1], [ebx + 1], etc.
Moving bytes around in memory ends up looking something like this:
mov dl, [eax+1]
In some cases, the register where the pointer is stored is modified directly, like this:
inc eax
NULL Terminators
Because strings are typically NULL terminated (especially when C is being used), looking for code that compares with a 0 byte can also be useful. Tests for the NULL character tend to look something like this:
test al, al test cl, cl
and so forth.
Figure 6-12 includes several single-byte operations:
cl, byte notation
[eax], a pointer
inc eax, increment pointer
test cl, cl, looking for NULL
[eax+1], pointer + 1 byte
mov dl, [eax+1], moving a single byte
[View full size image]
These operations may indicate that the program is parsing or otherwise processing input.
|
1 comment:
The parser is hand-written, and I didn't really have a design when I set out writing it, so it scares people. To date, only Evan and Colin have contributed changes that I know of. The code, to be frank, is a hedge of thorns.
LMX 2.0 will still be hand-written, but this time I've done some design up front. Unless any major design issues come up while I write the code, the new code will be much cleaner and easier to follow.
Totally winging it has its moments, but LMX 1.0 is definitely a counterexample.
All that said, however, what matters most to me is the experience to the user, and LMX 1.0 certainly provides a better experience to the user for big messages (no backtracking = faster consumption = chat window on the screen quicker). I accept that trade-off, especially knowing that the trade-off will mostly go away in LMX 2.0.
creatine
Post a Comment