Linux system call on kernel crash (strange offset)

I am trying to call a system call from a kernel module, I have this code:

    set_fs( get_ds() );    // lets our module do the system-calls 


    // Save everything before systemcalling

    asm ("     push    %rax     "); 
    asm  ("     push    %rdi     "); 
    asm  ("     push    %rcx     "); 
    asm  ("     push    %rsi     "); 
    asm  ("     push    %rdx     "); 
    asm  ("     push    %r10     "); 
    asm  ("     push    %r8      "); 
    asm  ("     push    %r9      "); 
    asm  ("     push    %r11     "); 
    asm  ("     push    %r12     "); 
    asm  ("     push    %r15     "); 
    asm  ("     push    %rbp     "); 
    asm  ("     push    %rbx     "); 


    // Invoke the long sys_mknod(const char __user *filename, int mode, unsigned dev);

    asm volatile ("     movq    $133, %rax     "); // system call number

    asm volatile ("    lea    path(%rip), %rdi     "); // path is char path[] = ".."

    asm volatile ("     movq    mode, %rsi     "); // mode is S_IFCHR | ...

    asm volatile ("     movq    dev, %rdx     ");  // dev is 70 >> 8

    asm volatile ("     syscall     "); 


      // POP EVERYTHING 

    asm ("     pop     %rbx     "); 
    asm ("     pop        %rbp     "); 
    asm ("     pop     %r15     "); 
    asm ("     pop        %r12     "); 
    asm ("     pop        %r11     "); 
    asm ("     pop        %r9      "); 
    asm ("     pop        %r8      "); 
    asm ("     pop        %r10     "); 
    asm ("     pop        %rdx     "); 
    asm ("     pop        %rsi     "); 
    asm ("     pop        %rcx     "); 
    asm ("     pop        %rdi     "); 
    asm ("     pop     %rax     "); 



    set_fs( savedFS );    // restore the former address-limit value 

      

This code doesn't work and flushes the system down (it's a kernel module).

Resetting this piece of code with information about movement:

  2c:    50                      push  %rax 
  2d:    57                      push  %rdi 
  2e:    51                      push  %rcx 
  2f:    56                      push  %rsi 
  30:    52                      push  %rdx 
  31:    41 52                    push  %r10 
  33:    41 50                    push  %r8 
  35:    41 51                    push  %r9 
  37:    41 53                    push  %r11 
  39:    41 54                    push  %r12 
  3b:    41 57                    push  %r15 
  3d:    55                      push  %rbp 
  3e:    53                      push  %rbx 
  3f:    48 c7 c0 85 00 00 00     mov    $0x85,%rax 
  46:    48 8d 3d 00 00 00 00     lea    0x0(%rip),%rdi        # 4d <init_module+0x4d> 
            49: R_X86_64_PC32    path-0x4 
  4d:    48 83 c7 04              add    $0x4,%rdi 
  51:    48 8b 34 25 00 00 00     mov    0x0,%rsi 
  58:    00 
            55: R_X86_64_32S    mode 
  59:    48 8b 14 25 00 00 00     mov    0x0,%rdx 
  60:    00 
            5d: R_X86_64_32S    dev 
  61:    0f 05                    syscall 
  63:    5b                      pop    %rbx 
  64:    5d                      pop    %rbp 
  65:    41 5f                    pop    %r15 
  67:    41 5c                    pop    %r12 
  69:    41 5b                    pop    %r11 
  6b:    41 59                    pop    %r9 
  6d:    41 58                    pop    %r8 
  6f:    41 5a                    pop    %r10 
  71:    5a                      pop    %rdx 
  72:    5e                      pop    %rsi 
  73:    59                      pop    %rcx 
  74:    5f                      pop    %rdi 
  75:    58                      pop    %rax 

      

I'm wondering ... why at offset -0x4 49: R_X86_64_PC32 the path is 0x4?

I mean: mode and dev should be resolved automatically without issue, but what about the path? Why is -0x4 offset?

I tried to "compensate for this" with

lea 0x0 (% rip),% rdi // this somehow adds an offset of -0x4 add $ 0x4,% rdi ....

but the code still crashed.

Where am I going wrong?

+3


source to share


2 answers


My guess about what's going on here is a stack issue. In contrast int $0x80

, the command syscall

does not set up a stack for the kernel. If you look at the actual code from system_call:

, you will see something like SWAPGS_UNSAFE_STACK

. The ball of this macro is the SwapGS instruction - see p. 152 here . When kernel mode is entered, the kernel needs a way to deduce a pointer to its data structures, and this instruction allows it to do just that. It does this by replacing the user register with %gs

a value stored in a model-specific register from which it can pop the kernel-mode stack.



You can imagine that after the entry point is called syscall

, this exchange produces the wrong value since you were already in kernel mode and the kernel starts trying to use a dummy stack. You can try starting SwapGS manually, which will make the kernel SwapGS output as it expects, and see if that works.

0


source


It seems that you cannot do that. See comment before system_call

:

 /*
  * Register setup:
  * rax  system call number
  * rdi  arg0
  * rcx  return address for syscall/sysret, C arg3
  * rsi  arg1
  * rdx  arg2
  * r10  arg3    (--> moved to rcx for C)
  * r8   arg4
  * r9   arg5
  * r11  eflags for syscall/sysret, temporary for C
  * r12-r15,rbp,rbx saved by C code, not touched.
  *
  * Interrupts are off on entry.
  * Only called from user space.
  *
  * XXX  if we had a free scratch register we could save the RSP into the stack frame
  *      and report it properly in ps. Unfortunately we haven't.
  *
  * When user can change the frames always force IRET. That is because
  * it deals with uncanonical addresses better. SYSRET has trouble
  * with them due to bugs in both AMD and Intel CPUs.
  */

      



So, you cannot call syscall

from the kernel. But you can try using int $0x80

for these purposes. As I can see, kernel_execve

stub uses the trick

0


source







All Articles