When should you use size directives in x86?

When to use size directives in x86 seems a little confusing. This x86 build guide reports the following:

In general, the estimated size of an item in a given memory address can be inferred from the assembly code instruction in which it is referenced. For example, in all of the above instructions, the size of the memory area could be inferred from the size of the operand register. When we loaded the 32-bit register, the assembler could that the memory area we were talking about was 4 bytes wide. When we stored the value of one byte register in memory, the assembler could infer that we wanted the address to refer to one byte in memory.

The examples they give are pretty trivial, like moving an immediate value to a register.
But what about more complex situations like:

mov    QWORD PTR [rip+0x21b520], 0x1

      

In this case, is not a redundant QWORD PTR size directive since, according to the tutorial above, we can assume that we want to move 8 bytes to the destination register due to the RIP being 8 bytes? What are the final rules for size directives for the x86 architecture? I couldn't find an answer for this, thanks.

Update: As Ross pointed out, the destination in the above example is not a register. Here's a better example:

mov    esi, DWORD PTR [rax*4+0x419260] 

      

In this case, it cannot be assumed that we want to move 4 bytes, since ESI is 4 bytes, what makes the DWORD PTR directive redundant?

+3


source to share


3 answers


You're right; this is rather ambiguous. Assuming we're talking about Intel syntax, it is true that you can often avoid using size directives. Whenever the assembler can automatically compute it, they are optional. For example, in the instructions

mov    esi, DWORD PTR [rax*4+0x419260] 

      

the DWORD PTR specifier is optional for the reason you assume that the assembler can figure out that it has to move the DWORD size value as the value moves into the DWORD size register.

Similarly, in

mov    rsi, QWORD PTR [rax*4+0x419260] 

      

the QWORD PTR specifier is optional for the same reason.

But this is not always necessary. Consider your first example:

mov    QWORD PTR [rip+0x21b520], 0x1

      



Here, the QWORD PTR specifier is not optional. Without it, the assembler doesn't know what size value you want to store, starting at the address rip+0x21b520

. Should it 0x1

be stored as BYTE? Expanded for WORD? DVORD? QWORD? Some builders might guess, but you cannot be sure of the correct result without explicitly specifying what you want.

In other words, when the value is in a register operand, the size specifier is optional, because the assembler can determine the size depending on the size of the register. However, if you are dealing with an immediate value or memory operand, you will probably need a size specifier to ensure you get the results you want.

Personally, I prefer to always include size when writing code. These are a couple of characters who are typing, but it makes me think about it and explicitly state what I want. If I messed up and coded a mismatch, then the assembler screams loudly at me that it breaks errors more than once. I also think that having it there improves readability. So, I agree with old_timer , although its perspective seems somewhat unpopular.

Disassemblers also tend to be verbose in their output, including size specifiers, even if they are optional. Hans Passant theorized in the comments that this was supposed to maintain backward compatibility with the old school pickers that were always needed, but I'm not sure if that's true. This may be part of it, but in my experience disassemblers tend to be verbose, and I think this is just to make it easier to parse code that you are not familiar with.

Note that AT&T syntax uses a slightly different clock. Instead of prefixing the length with the operand, it adds a suffix to the command mnemonics: b

for byte, w

for word, l

for dword, and q

for qword. So, the three previous examples:

movl    0x419260(,%rax,4), %esi
movq    0x419260(,%rax,4), %rsi
movq    $0x1, 0x21b520(%rip)

      

Again, in the first two instructions the prefixes l

and q

are optional, since the assembler can output the appropriate size. In the last statement, as in the Intel syntax, the prefix is ​​optional. So it's the same in AT&T syntax as Intel syntax, just in a different format for size specifiers.

+3


source


RIP

or any other register in the address refers only to the addressing mode, and not to the width of the transmitted data. The memory reference [rip+0x21b520]

can be used with 1, 2, 4, or 8 byte accesses, and the constant value 0x01

can also be 1 to 8 bytes ( 0x01

same as 0x00000001

, etc.). ) Thus, in this case, the size of the operand must be explicitly specified.

With a register as source or destination, the size of the operand will be implicit: if, say EAX

, the data is 32 bits or 4 bytes:

mov    [rip+0x21b520],eax

      



And, of course, in AT&T's awfully pretty syntax, the operand size is marked as a suffix in the command mnemonic (here l

).

movl $1, 0x21b520(%rip) 

      

+1


source


it gets worse, the assembler is defined by the assembler, the program that reads / interprets / parses it. And, in particular, x86, but generally there is no technical reason for any two assemblers for the same purpose to have the same assembly language, they are usually similar but should not be.

You've fallen into a couple of pitfalls, first with the specific syntax used for the assembler you use with respect to the size directive, and then the default. My recommendation is ALWAYS using the size directive (or if there is a unique mnemonic instruction), then you never have to worry about it right?

-1


source







All Articles