Linux system call on kernel crash (strange offset)
I am trying to call a system call from a kernel module, I have this code:
set_fs( get_ds() ); // lets our module do the system-calls
// Save everything before systemcalling
asm (" push %rax ");
asm (" push %rdi ");
asm (" push %rcx ");
asm (" push %rsi ");
asm (" push %rdx ");
asm (" push %r10 ");
asm (" push %r8 ");
asm (" push %r9 ");
asm (" push %r11 ");
asm (" push %r12 ");
asm (" push %r15 ");
asm (" push %rbp ");
asm (" push %rbx ");
// Invoke the long sys_mknod(const char __user *filename, int mode, unsigned dev);
asm volatile (" movq $133, %rax "); // system call number
asm volatile (" lea path(%rip), %rdi "); // path is char path[] = ".."
asm volatile (" movq mode, %rsi "); // mode is S_IFCHR | ...
asm volatile (" movq dev, %rdx "); // dev is 70 >> 8
asm volatile (" syscall ");
// POP EVERYTHING
asm (" pop %rbx ");
asm (" pop %rbp ");
asm (" pop %r15 ");
asm (" pop %r12 ");
asm (" pop %r11 ");
asm (" pop %r9 ");
asm (" pop %r8 ");
asm (" pop %r10 ");
asm (" pop %rdx ");
asm (" pop %rsi ");
asm (" pop %rcx ");
asm (" pop %rdi ");
asm (" pop %rax ");
set_fs( savedFS ); // restore the former address-limit value
This code doesn't work and flushes the system down (it's a kernel module).
Resetting this piece of code with information about movement:
2c: 50 push %rax
2d: 57 push %rdi
2e: 51 push %rcx
2f: 56 push %rsi
30: 52 push %rdx
31: 41 52 push %r10
33: 41 50 push %r8
35: 41 51 push %r9
37: 41 53 push %r11
39: 41 54 push %r12
3b: 41 57 push %r15
3d: 55 push %rbp
3e: 53 push %rbx
3f: 48 c7 c0 85 00 00 00 mov $0x85,%rax
46: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # 4d <init_module+0x4d>
49: R_X86_64_PC32 path-0x4
4d: 48 83 c7 04 add $0x4,%rdi
51: 48 8b 34 25 00 00 00 mov 0x0,%rsi
58: 00
55: R_X86_64_32S mode
59: 48 8b 14 25 00 00 00 mov 0x0,%rdx
60: 00
5d: R_X86_64_32S dev
61: 0f 05 syscall
63: 5b pop %rbx
64: 5d pop %rbp
65: 41 5f pop %r15
67: 41 5c pop %r12
69: 41 5b pop %r11
6b: 41 59 pop %r9
6d: 41 58 pop %r8
6f: 41 5a pop %r10
71: 5a pop %rdx
72: 5e pop %rsi
73: 59 pop %rcx
74: 5f pop %rdi
75: 58 pop %rax
I'm wondering ... why at offset -0x4 49: R_X86_64_PC32 the path is 0x4?
I mean: mode and dev should be resolved automatically without issue, but what about the path? Why is -0x4 offset?
I tried to "compensate for this" with
lea 0x0 (% rip),% rdi // this somehow adds an offset of -0x4 add $ 0x4,% rdi ....
but the code still crashed.
Where am I going wrong?
source to share
My guess about what's going on here is a stack issue. In contrast int $0x80
, the command syscall
does not set up a stack for the kernel. If you look at the actual code from system_call:
, you will see something like SWAPGS_UNSAFE_STACK
. The ball of this macro is the SwapGS instruction - see p. 152 here . When kernel mode is entered, the kernel needs a way to deduce a pointer to its data structures, and this instruction allows it to do just that. It does this by replacing the user register with %gs
a value stored in a model-specific register from which it can pop the kernel-mode stack.
You can imagine that after the entry point is called syscall
, this exchange produces the wrong value since you were already in kernel mode and the kernel starts trying to use a dummy stack. You can try starting SwapGS manually, which will make the kernel SwapGS output as it expects, and see if that works.
source to share
It seems that you cannot do that. See comment before system_call
:
/*
* Register setup:
* rax system call number
* rdi arg0
* rcx return address for syscall/sysret, C arg3
* rsi arg1
* rdx arg2
* r10 arg3 (--> moved to rcx for C)
* r8 arg4
* r9 arg5
* r11 eflags for syscall/sysret, temporary for C
* r12-r15,rbp,rbx saved by C code, not touched.
*
* Interrupts are off on entry.
* Only called from user space.
*
* XXX if we had a free scratch register we could save the RSP into the stack frame
* and report it properly in ps. Unfortunately we haven't.
*
* When user can change the frames always force IRET. That is because
* it deals with uncanonical addresses better. SYSRET has trouble
* with them due to bugs in both AMD and Intel CPUs.
*/
So, you cannot call syscall
from the kernel. But you can try using int $0x80
for these purposes. As I can see, kernel_execve
stub uses the trick
source to share