How are functions encoded / stored in memory?

I understand how things like numbers and letters are binary encoded and therefore can be stored as 0 and 1.

But how are functions stored in memory? I don't see how they can be stored as 0 and 1, and I don't see how something can be stored in memory as nothing but 0 and 1.

+3


source to share


2 answers


They are actually stored in memory as 0 and 1

Here's a real world example:

int func(int a, int b) {
    return (a + b);
}

      

Here is an example of 32-bit x86 machine instructions that the compiler can generate for a function (in a textual representation known as code):

func:
        push    ebp
        mov     ebp, esp
        mov     edx, [ebp+8]
        mov     eax, [ebp+12]
        add     eax, edx
        pop     ebp
        ret

      

It is beyond the scope of this question to make each of these instructions work, but each of these characters (such as add, pop, mov, etc.) and their parameters are encoded to 1 and 0. This table shows many Intel instructions and a short a summary of how they are encoded. See also wiki tags for links to Docs / manual / manual.


So how can one do the conversion of code from a text assembly to machine readable bytes (aka machine code)? Take an instruction manual, for example add eax, edx

. This page shows how the add command is encoded. eax and edx are registers , physical spots in the processor used to store information for processing. Variables in computer programming are often mapped to registers at some point. Since we are adding registers, and the registers are 32-bit, we choose the opcode 000000001 (see also Intel official reference manual for instruction set for ADD , which lists all available forms).

The next step is to specify the operands. This section of the same previous page shows how to do this with the "add ecx, eax" example, which is very similar to ours. The first two bits should be "11" to indicate that we are adding registers. The next 3 bits define the first register, in our case we select edx and not eax in their example, which leaves us with '100'. The next 3 bits define our eax, so we have the final result

00000001 11100000

      

What is 01 D0 in hexadecimal. A similar process can be applied to convert any command to binary. The tool used for this automatically is called the assembler .




So, running the above assembly code through assembler results in the following output:

66 55 66 89 E5 66 67 8B 55 O8 66 67 8B 45 0C 66 01 D0 66 5D C3

      

Notice at 01 D0

the end of the line, this is our "add" command. Converting machine code bytes back to a textual mnemonic assembly is called disassembly:

 address | machine code  |  disassembly
   0:      55              push   ebp
   1:      89 e5           mov    ebp, esp
   3:      8b 55 08        mov    edx, [ebp+0x8]
   6:      8b 45 0c        mov    eax, [ebp+0xc]
   9:      01 d0           add    eax, edx
   b:      5d              pop    ebp
   c:      c3              ret    

      

Addresses start at zero because this is only .o

, not associated binary. Therefore, they only refer to the beginning of the section of the file .text

.

You can see this for any feature you like in the Godbolt Compiler Explorer (or on your own machine on any binary, recently compiled or not, using a disassembler).


You may notice that there is no mention of the name "func" in the final release. This is because in machine code the function refers to its location in RAM, not its name. The compiler object file may have an entry func

in its symbol table referring to that block of machine code, but the symbol table is read by software, not something that the CPU hardware can decode and run directly. The machine code bit patterns are visible and decoded directly by transistors in the CPU .

Sometimes we find it difficult to understand how computers code such instructions at a low level, because as programmers or power users we have tools to avoid direct contact with them. We rely on compilers, assemblers, and interpreters to do this job for us. However, everything a computer ever does must be specified in machine codes.

EDIT: Explain the exact process of converting from instruction to opcode.

+13


source


Functions are executed from instructions such as bytecode or machine code . Instructions are numbers that can be binary encoded.



A good introduction to this is Charles Petzold's book Code .

0


source







All Articles