Analyzing processor registers during a kernel crash

I was debugging the issue and hitting the error below and also generating a crash dump. To some extent, I know how to get to the exact line in the code where the error occurred using the gdb (l * (debug_fucntion + 0x19)) command.

<1>BUG: unable to handle kernel paging request at ffffc90028213000
<1>IP: [<ffffffffa0180279>] debug_fucntion+0x19/0x160 [dise]
<4>PGD 103febe067 PUD 103febf067 PMD fd54e1067 PTE 0
<4>Oops: 0000 [#1] SMP
<4>last sysfs file: /sys/kernel/mm/ksm/run
<4>CPU 7
<4>Modules linked in: dise(P)(U) ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge autofs4 8021q garp stp llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vhost_net macvtap macvlan tun kvm uinput ipmi_devintf power_meter microcode iTCO_wdt iTCO_vendor_support dcdbas sg ses enclosure serio_raw lpc_ich mfd_core i7core_edac edac_core bnx2 ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: dise]
<4>
<4>Pid: 1126, comm: diseproc Tainted: P        W  ---------------    2.6.32-431.el6.x86_64 #1 Dell Inc. PowerEdge R710/0MD99X
<4>RIP: 0010:[<ffffffffa0180279>]  [<ffffffffa0180279>] debug_fucntion+0x19/0x160 [dise]
<4>RSP: 0018:ffff880435fc5b88  EFLAGS: 00010282
<4>RAX: 0000000000000000 RBX: 0000000000010000 RCX: ffffc90028213000
<4>RDX: 0000000000010040 RSI: 0000000000010000 RDI: ffff880fe36a0000
<4>RBP: ffff880435fc5b88 R08: ffffffffa025d8a3 R09: 0000000000000000
<4>R10: 0000000000000004 R11: 0000000000000004 R12: 0000000000010040
<4>R13: 000000000000b101 R14: ffffc90028213010 R15: ffff880fe36a0000
<4>FS:  00007fbe6040b700(0000) GS:ffff8800618e0000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
<4>CR2: ffffc90028213000 CR3: 0000000fc965b000 CR4: 00000000000007e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process diseproc (pid: 1126, threadinfo ffff880435fc4000, task ffff8807f8be8ae0)
<4>Stack:
<4> ffff880435fc5be8 ffffffffa0180498 0000000081158f46 00000c200000fd26
<4><d> ffffc90028162000 0000fec635fc5bc8 0000000000000018 ffff881011d80000
<4><d> ffffc90028162000 ffff8802f18fe440 ffff880fc80b4000 ffff880435fc5cec
<4>Call Trace:
<4> [<ffffffffa0180498>] cmd_dump+0x1c8/0x360 [dise]
<4> [<ffffffffa01978e1>] debug_log_show+0x91/0x160 [dise]
<4> [<ffffffffa013afb9>] process_debug+0x5a9/0x990 [dise]
<4> [<ffffffff810792c7>] ? current_fs_time+0x27/0x30
<4> [<ffffffffa013bc38>] dise_ioctl+0xd8/0x300 [dise]
<4> [<ffffffff8105a501>] ? hotplug_hrtick+0x21/0x60
<4> [<ffffffff8119db42>] vfs_ioctl+0x22/0xa0
<4> [<ffffffff8119dce4>] do_vfs_ioctl+0x84/0x580
<4> [<ffffffff8119e261>] sys_ioctl+0x81/0xa0
<4> [<ffffffff810e1e5e>] ? __audit_syscall_exit+0x25e/0x290
<4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
<4>Code: be c4 10 e1 48 8b 5d d8 44 01 f0 4c 8b 65 e0 4c 8b 6d e8 4c 8b 75 f0 4c 8b 7d f8 c9 c3 0f 1f 44 00 00 55 48 89 e5 0f 1f 44 00 00 <48> 8b 01 48 c1 e8 3c 83 f8 08 76 0b e8 f6 fb ff ff c9 c3 0f 1f
<1>RIP  [<ffffffffa0180279>] debug_fucntion+0x19/0x160 [dise]
<4> RSP <ffff880435fc5b88>
<4>CR2: ffffc90028213000

      

Question: is there

  • Can the contents of the CPU register provide additional information? How to decode them?

  • Can I find out the values ​​of the variables or the values ​​of the data structure from the crash dump that is causing the crash?

  • What does "Code: be c4 10 e1 48 8b 5d ..." mean, tell me here?

+1


source to share


3 answers


You should understand that you are testing (not debugging) at the assembly level (not in source code). It is important that you keep yourself in mind when inspecting emergency dumps.

You should carefully read the drop date report line by line because it contains a lot of information as well as everything you got.

When you got the place where your code was broken, you should figure out why this happened by reading the report and dumping the crash dump.

The first line of the dump failure report states

BUG: unable to handle kernel paging request at ffffc90028213000

      

This means that you are using invalid memory.

Line

Process diseproc (pid: 1126, threadinfo ffff880435fc4000, task ffff8807f8be8ae0)

      

reports what happened in user space during the crash. It looks like the userpace process diseproc

issued some command to your driver causing it to crash.

Very important line

IP: [<ffffffffa0180279>] debug_fucntion+0x19/0x160 [dise]

      



Try issuing a command dis debug_function

to parse debug_function, find debug_function+25

(0x19 hex = 25 dec) and look around. Read it alongside the C source code for debug_function. You can usually find the location of the failure in C code by comparing the instructions callq

- disassembling will display the name of the called call for printing. In the meantime, there is no need to worry about it. ”

Next and most important is the call trace:

Call Trace:
 [<ffffffffa0180498>] cmd_dump+0x1c8/0x360 [dise]
 [<ffffffffa01978e1>] debug_log_show+0x91/0x160 [dise]
 [<ffffffffa013afb9>] process_debug+0x5a9/0x990 [dise]
 [<ffffffff810792c7>] ? current_fs_time+0x27/0x30
 [<ffffffffa013bc38>] dise_ioctl+0xd8/0x300 [dise]
 [<ffffffff8105a501>] ? hotplug_hrtick+0x21/0x60
 [<ffffffff8119db42>] vfs_ioctl+0x22/0xa0
 [<ffffffff8119dce4>] do_vfs_ioctl+0x84/0x580
 [<ffffffff8119e261>] sys_ioctl+0x81/0xa0
 [<ffffffff810e1e5e>] ? __audit_syscall_exit+0x25e/0x290
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

      

Reading top to bottom: the kernel got ioctl (from diseproc, apparently), ioctl handler handler kernel dise_ioctl

module disease, then current_fs_time

, process_debug

, debug_log_show

and, finally cmd_dump

.

Now you know:

  • Code: dise_ioctl

    β†’ current_fs_time

    β†’ process_debug

    β†’ debug_log_show

    β†’ cmd_dump

    β†’ somehow before debug_function

    .
  • Approximate place in C code that caused the crash
  • Cause of failure: Invalid memory access

With this information, you must use your last and most powerful method - thinking. Try to understand which variables / structures caused the crash. Maybe some of them were released by the time you arrived at debug_function

? Maybe you are wrong about pointer arithmetic?

Answers on questions:

  • In most cases, the CPU register values ​​are meaningless because it has nothing to do with your C code. Just some values ​​pointing to some kind of memory - whatever. Yes, there are some extremely useful registers like RIP / EIP and RSP / ESP, but most of them are way out of context.

  • Very unlikely. You are not actually debugging - you are checking your dump - you have no debugging context.

  • I agree with @ user2699113 that this is just the memory contents under the pointer from RIP.

And remember, the best debugging tool is your brain.

+8


source


See here ... This has good documentation on how to debug kernel crashes. See sectionObjdump

What it says is that you can parse the kernel image using Objdump

vmlinux in the image. This command will output a large text file of your kernel source code ... Then you can grep

cause the problem by calling EIP

in the previously generated output file.



PS: I would recommend doing Objdump

on vmlinux and saving it locally.

+1


source


  • and 2: it is quite difficult to determine how processor registers relate to parameters and variable values.

3: This code is assembly code. You can find it in your disassembled program and find out where this problem started. Note that there is <48> 8b 01 48 ... - and AFAIK a trap happens in this assembler command. This means that you need to debug it by parsing your code. If you compile your program (module) with debug symbols, you can find out the line number where the problem occurred.

0


source







All Articles