Referencing registers in machine codes

Question

Referencing registers in machine codes

I am looking at the assembly code and the corresponding memory dump and I am having a hard time understanding what is going on. I use this as a reference for opcodes for x86 and it is like for registers in x86 . I ran into these commands and I realized that a large piece of the puzzle is still missing.

8B 45 F8       - mov eax,[ebp-08] 
8B 80 78040000 - mov eax,[eax+00000478]
8B 00          - mov eax,[eax]

Basically I don't understand what the two bytes after the opcode mean, and I can't find anywhere that gives the bitwise format for the commands (if anyone can point me to one of them, it would be greatly appreciated).

How does the processor know how long each of these instructions are?

As per my link, this 8B mov instruction allows 32b or 16b registers, which means there are 16 possible registers (AX, CX, DX, BX, SP, BP, SI, DI and their extended equivalents). This means that you need a whole byte to specify which register to use in each operand.

So far, still fine, the two bytes after the opcode can indicate which registers to use. Then I noticed that these instructions are stacked in bytes by bytes, and all three of them use a different number of bytes to indicate the offset to be used when dereferencing the second operand.

I guess you can limit the registers to only be able to use 16b with 16b and 32b with 32b, but that only frees one bit, not enough to tell the CPU how many bytes are offsets.

What values correspond to what registers?

The second thing that worries me is that although my reference is clearly register numbers, I don't see any correlation with the bytes after the opcode in these commands. These commands don't seem to be consistent even with ourselves. The second and third commands go from eax to eax, but there is a bit in the middle of the first byte that is different.

Following my link, I would assume 0 is EAX, 1 is ECX, 2 is EDX, etc. This, however, doesn't give me any idea of how you should be specifying between RAX, EAX, AX, AL, and AH. Some of the commands seem to only accept registers 8b, while others accept 16b or 32b, and on x86_64 some seem to accept registers 16b, 32b, or 64b. So would you just do something like 0-7, is it R, 8-15 E, 16-23 not extended, and 24-31 H and L? Even if it's something like this, it looks like it should be much easier to find a guide or something that indicates it.

+3

assembly x86 machine-code

SoggyPancakes 05 Aug 17 at 20:37

source to share

1 answer

prl · Accepted Answer · 2017-08-05T21:00:30+0000

The first byte after the opcode is the ModR / M byte. The first link you link contains tables for the ModR / M byte at the end of the page. For a memory access instruction such as this, the ModR / M bytes indicate the register to be loaded or stored and the addressing mode that will be used to access the memory.

The byte (s) following the ModR / M byte depend on the value of the ModR / M byte.

In the command "mov eax, [ebp-8]" the ModR / M byte is 45. From the table for the 32-bit ModR / M byte, this means that Reg is eax, and the effective address is [EBP] + disp8. The next byte of the F8 command is an 8-bit offset offset.

The size of an operand in an instruction can be implicit in the instruction, or it can be specified by a command prefix. For example, the 66 prefix would indicate the 16-bit operands for the mov instruction, for example in your examples. Prefix 48 indicates 64-bit operands if you are using 64-bit mode.

8-bit operands are usually indicated by the least significant bit of the instruction. If you change the instruction in your example from 8B to 8A, it becomes an 8-bit move in al.

Referencing registers in machine codes

More articles: