Assembly: What is the purpose of movl data_items (,% edi, 4),% eax in this program

Question

Assembly: What is the purpose of movl data_items (,% edi, 4),% eax in this program

This program loops through all the numbers stored in memory with .long and places the largest number in the EBX register for viewing when the program exits.

.section .data
data_items:
    .long 3, 67, 34, 222, 45, 75, 54, 34, 44, 33, 22, 11, 66, 0

.section .text
.globl _start

_start:
    movl $0, %edi
    movl data_items (,%edi,4), %eax
    movl %eax, %ebx
start_loop:
    cmpl $0, %eax
    je loop_exit
    incl %edi
    movl data_items (,%edi,4), %eax
    cmpl %ebx, %eax
    jle start_loop
    movl %eax, %ebx
    jmp start_loop
loop_exit:
    movl $1, %eax
    int $0x80

I'm not sure about the purpose (,%edi,4)

of this program. I read that commas are meant to be separated, and that 4 is a reminder to our computer that each item number is 4 bytes long. Since we have already stated that each number has 4 bytes with .long, why would we do it again here? Also, can someone explain in more detail what purpose these two commas serve in this situation?

0

assembly x86 comma

Katz_Katz_Katz 10 jan. '18 at 1:14

source to share

1 answer

Matteo italia · Accepted Answer · 2018-01-10T01:34:17+0000

In AT & syntax, memory operands have the following syntax ¹ :

displacement(base_register, index_register, scale_factor)

Base, index, and offset components can be used in any combination and each component can be omitted

but obviously the commas must be preserved if you omit the base register, otherwise it would be impossible for the assembler to figure out which of these components you are leaving.

All this data is combined to calculate the address that you specify, with the following formula:

effective_address = displacement + base_register + index_register*scale_factor

(by the way, almost exactly as you would specify it in Intel syntax).

So, armed with this knowledge, we can decipher your instructions:

movl data_items (,%edi,4), %eax

Following the syntax above, you will see that:

data_items

- offset;
base_register

omitted, therefore not entered in the formula above;
%edi

- index_register

;
4

scale_factor

...

So you are telling the processor mov ea l ong from location data_items+%edi*4

to register %eax

.

*4

is necessary because each element of your array is 4 bytes wide, so to convert the index (in %edi

) to the offset (in bytes) from the beginning of the array, you need to multiply it by 4.

Since we already declared that each number is 4 bytes from .long, why would we do it again?

Assemblers are low-level tools that know nothing about types.

.long

is not an array declaration, just a directive for the assembler to emit bytes corresponding to the 32-bit representation of its parameters;
data_items

is not an array, it is just a character that resolves to some memory location, just like other labels; the fact that you put the directive .long

after it doesn't really matter to the assembler.

Notes

Technically, there will also be a segment specifier, but given that we are talking about 32-bit code in Linux, I am omitting the segments entirely as they will only add confusion.

Assembly: What is the purpose of movl data_items (,% edi, 4),% eax in this program

More articles: