Building NASM x86_64 in 32-bit mode: why does this instruction generate RIP-Relative Addressing code?
[bits 32]
global _start
section .data
str_hello db "HelloWorld", 0xa
str_hello_length db $-str_hello
section .text
_start:
mov ebx, 1 ; stdout file descriptor
mov ecx, str_hello ; pointer to string of characters that will be displayed
mov edx, [str_hello_length] ; count outputs Relative addressing
mov eax, 4 ; sys_write
int 0x80 ; linux kernel system call
mov ebx, 0 ; exit status zero
mov eax, 1 ; sys_exit
int 0x80 ; linux kernel system call
The fundamental thing here is that I need the length of the hello string to be passed to the linux sys_write system call. Now I am well aware that I can just use EQU and everything will be fine, but I am really trying to figure out what is going on here.
So basically, when I use EQU, it loads the value and that's fine.
str_hello_length equ $-str_hello
...
...
mov edx, str_hello_length
However, if I use this line with DB
str_hello_length db $-str_hello
...
...
mov edx, [str_hello_length] ; of course, without the brackets it'll load the address, which I don't want. I want the value stored at that address
instead of loading the value at that address as I expect, the assembler outputs RIP-Relative Addressing as shown in the gdb debugger, and I'm just wondering why.
mov 0x6000e5(%rip),%edx # 0xa001a5
Now I tried to use eax register (and then moved eax to edx), but then I have another problem. I am getting segmentation error as stated in gdb:
movabs 0x4b8c289006000e5,%eax
so apparently different registers produce different code. I guess I need to truncate the upper 32-bits somehow, but I don't know how.
Even though I found a "solution", it looks like this: load eax with the address str_hello_length, and then load the contents of the address that eax points to, and it's all hunky dory.
mov eax, str_hello_length
mov edx, [eax] ; count
; gdb disassembly
mov $0x6000e5,%eax
mov (%rax),%edx
Apparently trying to indirectly load a value from a mem address creates different code? I really do not know.
I just need help understanding the syntax and operations of these instructions, so I can better understand why to load efficient addresses. Yes, I think I could just switch to EQU and be on my fun path, but I really feel like I can't continue until I understand what's going on with the DB declaration and loading from it.
source to share
Answer: it is not. x86-64 does not have RIP-relative addressing in 32-bit emulation mode (this should be obvious, because RIP does not exist in 32-bit). What's going on is that nasm is compiling some nice 32-bit opcodes for you, which you try to run as 64-bit. GDB parses your 32-bit opcodes as 64-bit and tells you that in 64-bit bytes they mean RIP-relative mov. 64-bit and 32-bit opcodes on x86-64 overlap a lot to use the normal decoding logic in silicon and you are confused because the code that GDB parses is similar to the 32 bit code you wrote, but you are really just throwing garbage bytes into the processor.
This has nothing to do with nasm. You are using the wrong architecture for the process you are in. Use 32-bit nasm in a 32-bit process, or compile your assembly code for [BITS 64].
source to share
You are asking for an assembler for 32-bit mode (c bits 32
), but you put that 32-bit machine code into a 64-bit object file, and then look at what happens when you disassemble it as x86-64 machine code.
So, you can see the differences between the instruction encoding in x86-32 and x86-64. that is, This is what happens when you decode 32-bit machine code as 64-bit .
mov 0x6000e5(%rip),%edx # 0xa001a5
The key in this case is that 32-bit x86 has two redundant ways to encode a 32-bit absolute address (no registers): with or without the SIB byte. 32-bit mode does not have RIP-relative (or EIP-relative) addressing.
x86-64 has replaced the shorter ( ModR/M + disp32
) form as RIP relative addressing mode , while 32-bit absolute addressing is still available with a longer encoding ModR/M + SIB + disp32
. (With the SIB byte, which of course does not encode base register and index register).
Note that the offset from the RIP is in fact an absolute static address that hosts your data (in 64-bit code) 0x6000e5
.
Comment is a disassembler showing the effective absolute address; RIP-relative addressing is calculated from the byte after the command, that is, the start of the next command.
movabs 0x4b8c289006000e5,%eax
When the destination register is EAX, your assembler (in 32-bit mode) chooses a shorter mov
encoding that loads eax
from a 32-bit absolute address without the ModR / M byte, simple A1 disp32
. Intel manual calls thismoffs
(memory offset) instead of the effective address.
In x86-64 mode, this opcode takes a 64-bit absolute address. (And unique in that it can load / store from a 64-bit absolute (not RIP-relative) address without first addressing a register). So the decoding consumes some of the next instruction as part of the 64-bit address and that's where some of those high bytes in the address come from. 0x6000e5
in the least significant 32 bits is correct and how it would decode as 32-bit machine code.
Changed
[bits 32]
to[bit 64]
See What happens if you use 32-bit int 0x80 Linux ABI in 64-bit code? ...
It is better to build a 32-bit executable if you are not going to use your own 64-bit system calls. Use nasm -felf32
and link with gcc -m32 -nostdlib -static
.
source to share
Probably the problem is that the offset is str_hello_length
greater than 32 bits. IA-32 does not support movements over 32 bits. The way around this is to use RIP-relative addressing under the (often correct) assumption that the distance between RIP and the address you are trying to reach is 32 bits. In this case the base RIP
and index is the length of the instruction, so if the instruction already has a base or index, RIP-Relative cannot be used.
Let's take a look at your various attempts:
str_hello_length equ $-str_hello
...
...
mov edx, str_hello_length
There is no memory access here, just just moving with immediate, so no addressing at all.
Further:
mov eax, str_hello_length
mov edx, [eax] ; count
Now the first command is a move with immediate, which is still not a memory access. The second command has access to memory, but uses it eax
as a base, and there is no offset. RIP-relative is only applicable at offset, so there is no RIP relative here.
Finally:
str_hello_length db $-str_hello
...
...
mov edx, [str_hello_length] ; of course, without the brackets it'll load the address, which I don't want. I want the value stored at that address
Here you use it str_hello_length
as your move. As I explained above, this will result in RIP-relative addressing.
source to share