Inconsistent build instructions in the shellcoders reference

Two years later and I will come back to it. Trying to tackle the silk reference again, but I keep finding inconsistencies. The book provides the following function:

int triangle (int width, in height){
   int array[5] = {0,1,2,3,4};
   int area;
   area = width * height/2;
   return (area);
}

      

and the following breakdown for the function:

0x8048430 <triangle>: push %ebp
0x8048431 <triangle+1>: mov %esp, %ebp
0x8048433 <triangle+3>: push %edi
0x8048434 <triangle+4>: push %esi
0x8048435 <triangle+5>: sub $0x30,%esp
0x8048438 <triangle+8>: lea 0xffffffd8(%ebp), %edi
0x804843b <triangle+11>: mov $0x8049508,%esi
0x8048440 <triangle+16>: cld
0x8048441 <triangle+17>: mov $0x30,%esp
0x8048446 <triangle+22>: repz movsl %ds:( %esi), %es:( %edi)
0x8048448 <triangle+24>: mov 0x8(%ebp),%eax
0x804844b <triangle+27>: mov %eax,%edx
0x804844d <triangle+29>: imul 0xc(%ebp),%edx
0x8048451 <triangle+33>: mov %edx,%eax
0x8048453 <triangle+35>: sar $0x1f,%eax
0x8048456 <triangle+38>: shr $0x1f,%eax
0x8048459 <triangle+41>: lea (%eax, %edx, 1), %eax
0x804845c <triangle+44>: sar %eax
0x804845e <triangle+46>: mov %eax,0xffffffd4(%ebp)
0x8048461 <triangle+49>: mov 0xffffffd4(%ebp),%eax
0x8048464 <triangle+52>: mov %eax,%eax
0x8048466 <triangle+54>: add $0x30,%esp
0x8048469 <triangle+57>: pop %esi
0x804846a <triangle+58>: pop %edi
0x804846b <triangle+59> pop %ebp
0x804846c <triangle+60>: ret

      

For academic reasons, I am trying to break down and explain each line of the assembly. But a few things are just wrong, for example:, lea 0xffffffd8(%ebp), %edi

I understand that the first part means multiplying the base pointer by 0xffffffd8, which seems to be wrong. Another example is mov $0x30, $esp

why you are moving a literal value into the stack pointer register. I could figure out if it was mov $0x30, (%ebp)

, but it doesn't seem to be the case. Am I wrong, or does it all just seem wrong?

+3


source to share


1 answer


But a few things are just wrong

Yes, they are. There are no unusual typographical errors in the book. You should usually look for the published bug list when you see something that makes you scratch your head. The publisher's website is a good place to look, as is the author. I don't know what the exact title of the book is, so I cannot search for it myself, but you can easily find it.

Of course, it's not that easy. Books from less reputable publishers will often not provide a list of errors, and less popular books often do not have enough readers to catch errors. You can contribute by providing the author's email address and reporting bugs. Or, if you are not sure if they are errors, ask the author for clarification. (You don't want the author to provide you with a personalized tutorial, but specific questions about things published in their books are always fair game.)

lea 0xffffffd8(%ebp), %edi

I understand that the first part means multiplying the base pointer by 0xffffffd8

which seems to be wrong

In this case, you understand what makes the code wrong. I blame this crazy AT&T syntax. Translating to Intel syntax:

lea edi, DWORD [ebp + ffffffd8h]

      

which is equivalent to:

lea edi, DWORD [ebp - 28h]

      

So this is actually equivalent to:

mov  edi, ebp
sub  edi, 28h

      

Now you are correct that the command LEA

can do multiplication. Well, sort of. It can be scaled with certain constants like 2, 4 and 8, which has the same effect as multiplication. But this form does not encode multiplication (or rather, more accurately, it is a scale by 1).

mov $0x30, $esp

why are you moving the literal value into the stack pointer register. I could figure out if it was mov $0x30, (%ebp)

, but it doesn't seem to be the case.

Yes, moving a literal to a stack pointer is a very strange thing. Never say never, but it should scream "error" (or "typo").

But look at the following instruction:

repz movsl  %ds:(%esi), %es:(%edi)

      



the reline operation prefix causes the string string to be MOVSL

repeated (in this case ) the number of times specified in the register ECX

, so the previous command probably should have initialized ECX

. It would be wise to initialize ECX

to 30h as this is the amount of space that was previously allocated on the stack ( subl $0x30, %esp

).

But there is another mistake here: the prefix REPZ

(or equivalent REPE

) does not make sense with an instruction MOVS

! [N]Z

/ [N]E

usually means that the zero flag is used as a secondary termination condition, but the move does not set any flags, so it makes no sense to write REPZ MOVS

! It should only be REP MOVS

.


To be honest, the whole disassembly is suspect as far as I know. I'm starting to wonder if the book it is written on is even worth it. Why would you show unoptimized assembly code? If you're trying to learn assembly language, you don't want to learn how to write suboptimal code. If you are trying to learn reverse engineering, there is no point in learning unoptimized code because that is not what the compiler generates. The same goes for exploits. I can't think of a compelling reason why you would ever want to waste your time looking at unoptimized code. There's just a bunch of distracting noise that doesn't teach you anything useful.

For example, you see the check mark of the unoptimized code at the very beginning: not returning base pointer initialization ( EBP

).

The purpose of the instruction REPZ MOVS

(and the associated mandatory instructions) is also a complete mystery to me. I don't even see a reason why the compiler would generate them with optimizations disabled.

My guess is that the author had to turn off the optimization because it would otherwise have ruled out allocating / initializing the entire array. Not the best examples.

This sequence should also be an error:

sar $0x1f, %eax
shr $0x1f, %eax

      

An unsigned right shift by 31 makes sense (isolating the sign bit as part of an optimized signed division by 2), but doing so immediately after the signed right shift does not occur. (The expected sar %eax

, which is part of this optimized division, comes later, in typical GAS format, which does not give an immediate $ 1).

If all (or even most) of the code is like this, then I recommend that you either skip this book or find another, or compile and disassemble the C functions yourself.

The unbroken C compiler generates the following code for this C function:

    ; Load second parameter from stack into EAX.
    movl    8(%esp), %eax

    ; Multiply that second parameter by the first parameter.
    ; (Could just as well have used a second movl, and then done a reg-reg imull.)
    imull   4(%esp), %eax

    ; Make a copy of that result in EDX.
    movl    %eax, %edx

    ; Optimized signed divide-by-2:
    shrl    $31, %eax
    addl    %edx, %eax
    sarl    $1, %eax    ; GAS encodes this as 'sarl %eax', making the $1 implicit

    ret

      

Or, if optimization was turned off (this is a bit more variable among different compilers, another reason why looking for unoptimized code is silly, but you can get the basic idea):

    ; Set up a stack frame
    pushl   %ebp
    movl    %esp, %ebp

    ; Allocate space on the stack for the pointless "array" array,
    ; and store the values in that space.
    ; (Why 32 bytes instead of only 30? To keep the stack pointer aligned.)
    subl    $32, %esp
    movl    $0, -24(%ebp)
    movl    $1, -20(%ebp)
    movl    $2, -16(%ebp)
    movl    $3, -12(%ebp)
    movl    $4, -8(%ebp)

    ; Get first parameter from the stack.
    movl    8(%ebp), %eax

    ; Multiply it by the second parameter.
    imull   12(%ebp), %eax

    ; Make a copy of the result.
    movl    %eax, %edx

    ; Optimized signed divide-by-2 (some compilers will always apply this
    ; strength-reduction, even when optimizations are disabled; others won't
    ; and will go ahead and emit the IDIV instruction you might expect):
    shrl    $31, %edx
    addl    %edx, %eax
    sarl    $1, %eax

    ; Store the result in a temporary location in memory.
    movl    %eax, -4(%ebp)

    ; Then load it back into EAX so it can be returned.
    movl    -4(%ebp), %eax

    ; Tear down the stack frame and deallocate stack space.
    leave

    ret

      

+4


source







All Articles