Ctypes: Cast string to work?

I was reading the Antivirus Evasion Tips article while testing a pen and was surprised by this Python program:

from ctypes import *
shellcode = '\xfc\xe8\x89\x00\x00....'

memorywithshell = create_string_buffer(shellcode, len(shellcode))
shell = cast(memorywithshell, CFUNCTYPE(c_void_p))


The silk code is abbreviated. Can someone please explain what's going on? I'm familiar with Python and C, I've tried reading the module ctypes

, but two main questions remain:

  • What's stored in shellcode

    I know it has something to do with C (in the article, it is shellcode from Metasploit, and a different notation for ASCII was chosen), but I cannot determine if it is a C source (maybe not) or comes from some kind of compilation (which?).

  • Depending on the first question, what magic happens during the broadcast?


3 answers

Take a look at this shellcode, I'll outweigh it from here (it pops up with MessageBoxA):

#include <stdio.h>

typedef void (* function_t)(void);

unsigned char shellcode[] =

void real_function(void) {
    puts("I'm here");

int main(int argc, char **argv)
    function_t function = (function_t) &shellcode[0];

    return 0;


Compile it under any debugger, I will use gdb:

> gcc shellcode.c -o shellcode
> gdb -q shellcode.exe
Reading symbols from shellcode.exe...done.


Parse the main function to see what is the difference between calls real_function

and function


(gdb) disassemble main
Dump of assembler code for function main:
   0x004013a0 <+0>:     push   %ebp
   0x004013a1 <+1>:     mov    %esp,%ebp
   0x004013a3 <+3>:     and    $0xfffffff0,%esp
   0x004013a6 <+6>:     sub    $0x10,%esp
   0x004013a9 <+9>:     call   0x4018e4 <__main>
   0x004013ae <+14>:    movl   $0x402000,0xc(%esp)
   0x004013b6 <+22>:    call   0x40138c <real_function> ; <- here we call our `real_function`
   0x004013bb <+27>:    mov    0xc(%esp),%eax
   0x004013bf <+31>:    call   *%eax                    ; <- here we call the address that is loaded in eax (the address of the beginning of our shellcode)
   0x004013c1 <+33>:    mov    $0x0,%eax
   0x004013c6 <+38>:    leave
   0x004013c7 <+39>:    ret
End of assembler dump.


There are two call

, make a breakpoint at <main+31>

to see what's loaded in eax:

(gdb) break *(main+31)
Breakpoint 1 at 0x4013bf
(gdb) run
Starting program: shellcode.exe
[New Thread 2856.0xb24]
I'm here

Breakpoint 1, 0x004013bf in main ()
(gdb) disassemble
Dump of assembler code for function main:
   0x004013a0 <+0>:     push   %ebp
   0x004013a1 <+1>:     mov    %esp,%ebp
   0x004013a3 <+3>:     and    $0xfffffff0,%esp
   0x004013a6 <+6>:     sub    $0x10,%esp
   0x004013a9 <+9>:     call   0x4018e4 <__main>
   0x004013ae <+14>:    movl   $0x402000,0xc(%esp)
   0x004013b6 <+22>:    call   0x40138c <real_function>
   0x004013bb <+27>:    mov    0xc(%esp),%eax
=> 0x004013bf <+31>:    call   *%eax                    ; now we are here
   0x004013c1 <+33>:    mov    $0x0,%eax
   0x004013c6 <+38>:    leave
   0x004013c7 <+39>:    ret
End of assembler dump.


Look at the first 3 bytes of data that the address in eax continues:

(gdb) x/3x $eax
0x402000 <shellcode>:   0xfc    0x33    0xd2
(gdb)                    ^-------^--------^---- the first 3 bytes of the shellcode


So the CPU will be call 0x402000

, the beginning of our shell code in 0x402000

, lets you parse what's ever in 0x402000


(gdb) disassemble 0x402000
Dump of assembler code for function shellcode:
   0x00402000 <+0>:     cld
   0x00402001 <+1>:     xor    %edx,%edx
   0x00402003 <+3>:     mov    $0x30,%dl
   0x00402005 <+5>:     pushl  %fs:(%edx)
   0x00402008 <+8>:     pop    %edx
   0x00402009 <+9>:     mov    0xc(%edx),%edx
   0x0040200c <+12>:    mov    0x14(%edx),%edx
   0x0040200f <+15>:    mov    0x28(%edx),%esi
   0x00402012 <+18>:    xor    %ecx,%ecx
   0x00402014 <+20>:    mov    $0x18,%cl
   0x00402016 <+22>:    xor    %edi,%edi
   0x00402018 <+24>:    xor    %eax,%eax
   0x0040201a <+26>:    lods   %ds:(%esi),%al
   0x0040201b <+27>:    cmp    $0x61,%al
   0x0040201d <+29>:    jl     0x402021 <shellcode+33>


As you can see shellcode is nothing more than assembly instructions, the only difference is how you write these instructions, it uses special techniques to make it more portable, like never using a fixed address.

Python equivalent to the above program:


from ctypes import *

shellcode_data = "\

shellcode = c_char_p(shellcode_data)

function = cast(shellcode, CFUNCTYPE(None))




  • shellcode

    contains compiled architecture-specific code, if I'm not mistaken, which roughly translates to a function call. (not an architecture expert, but the code is truncated ...)

  • So once you've created a C style string with create_string_buffer

    , you can trick python into thinking it is a function with a call cast

    . Python then executes the code originally contained in the shellcode


There is a useful link here: http://www.blackhatlibrary.net/Python#Ctypes



Let's not forget that in order to have executable code, it must be converted to a format that your machine understands. What you are doing there is a sequence of byte codes that can be interpreted by your machine, so you can tell your machine to execute it. You effectively skip the compiler by providing the final byte codes; this technique is common in Just-In-Time compilers, which must generate executable code while the program is running. So it really has little to do with C (or Python or any other language), but has a lot to do with the details of the architecture in which this code is supposed to work.

The first byte code has CLD

(0xfc) followed by an instruction CALL

(0xe8) that translates the code to an address based on the offset specified in the next 4 bytes in that bytecode sequence, and therefore to.



