How to debug ARM Linux kernel (msleep ())?
I am primarily looking for tips for debugging. If someone can specify one line of code to change, or one bit of peripheral configuration to install a fix for the problem, that would be awesome. But that's not what I hope for; I am looking more for how should I debug it.
Googling "msleep hang linux kernel site: stackoverflow.com" gives 13 answers and none fit, so I think I'm safe.
I am rebuilding an ARM Linux kernel for a TI AM1808 ARM embedded processor (Sitara / DaVinci?). I can see that all boot log before login login: out of the serial port, but the login attempt doesn't get a response, doesn't even repeat what I typed.
After a lot of debugging, I came to the kernel and added the debug code between lines 828 and 830 (yes, the kernel version is 2.6.37). This is currently in kernel mode prior to calling 'sbin / init':
http://lxr.linux.no/linux+v2.6.37/init/main.c#L815
Right before line 830 I added a forver loop printk and I can see the results. I let it work for about a couple of hours and it totals about 2 million. Example line:
dbg:init/main.c:1202: 2088430
Thus, it delivered 60 million bytes without any problems.
However, if I add msleep (1000) in a loop, it only prints once, i.e. msleep () does not return.
More details: Adding a conditional printk on line 4073 in the scheduler, whose flag conditions, which are set at the start of the forever test loop described above, show that the schedule () is no longer called when it hangs:
http://lxr.linux.no/linux+v2.6.37/kernel/sched.c#L4064
The only options in .config / 'Device Drivers' are: Blocking devices I2C support SPI support
The kernel and its ramdisk are loaded using uboot / TFTP. I don't think he is trying to use Ethernet. Since all this happened before '/ sbin / init', very little needs to happen.
More details: I have a very similar board with the same processor. I can run the same uImage and the same ramdisk and it works fine. I can go in and do the usual things.
I checked the memory test (only 64MB, limited the core to 32M and tested the other 32M, this is one DDR2 chip) and found no problem. One board is using UART0 and the other UART2, but the boot log comes out of both, so there shouldn't be a problem.
Any advice on debugging is greatly appreciated. I don't have a corresponding JTAG, so I can't use it.
source to share
If it msleep
doesn't return or jump to schedule
, then we can follow the call stack for debugging.
msleep
calls schedule_timeout_uninterruptible(timeout)
that calls schedule_timeout(timeout)
, which in the default case exits without calling the schedule if the timeout in the jiffies passed to it is <0, so that's one thing to check.
If timeout
positive, called setup_timer_on_stack(&timer, process_timeout, (unsigned long)current);
, then __mod_timer(&timer, expire, false, TIMER_NOT_PINNED);
before the call schedule
.
If we don't get to schedule
, then something must be happening in setup_timer_on_stack
either __mod_timer
.
Calltrace for setup_timer_on_stack
are setup_timer_on_stack
calls setup_timer_on_stack_key
that call init_timer_on_stack_key
, either external, if CONFIG_DEBUG_OBJECTS_TIMERS
enabled, or call init_timer_key(timer, name, key);
, which calls
debug_init
, followed by __init_timer(timer, name, key)
.
__mod_timer
calls first timer_stats_timer_set_start_info(timer);
, then a whole series of other function calls.
I would suggest starting by putting a printk or two in schedule_timeout
, perhaps either side of the call setup_timer_on_stack
or either side of the call __mod_timer
.
source to share
This issue has been resolved.
With the liberal use of prink, it was determined that sched () does indeed switch to another task - a simple task. In this case, being embedded Linux, the original codebase I copied from an installed idle task. This simple task seems inappropriate for my board and has blocked the processor and thus caused a crash. Commenting on an idle task call
http://lxr.linux.no/linux+v2.6.37/arch/arm/mach-davinci/cpuidle.c#L93
works around the problem.
source to share