How do I debug a multithreaded C ++ application that hangs (dead end)?

In java debugging, hanging application is easy. Can you take a memory dump of the application and use and use the eclipse jvm dump analyzer to view the state of the threads and where were all threads blocked?

Does something like this exist for C ++?

+2


source to share


9 replies


You can do the same with C ++; make a kernel dump and peep into it after.



Or, if you are using MSVC, you can simply attach a debugger to your application while it is running. Hit "break everything" and push through the streams.

+3


source


Magic call in gdb:

thread all apply bt



Runs the bt (backtrace) command on all threads. Unless you have completely deprived your program, you should be able to see the names of each function.

This works for both live and posthumous (i.e. running gdb for the kernel).

+4


source


In native Windows applications, Windbg is the tool for me. If possible, I'll debug a stalled process live without getting a full process memory dump to usually get you there.

My approach is to draw a wait graph documenting the relationship between threads and resources. I usually start by running the command ! Locks to determine which threads contain any critical sections in a frozen process.

Then I start drawing the wait graph, picking the critical section with the highest number of hits (if there is a dead end, there will be a cycle in the graph, so it doesn't really matter where you start from). Find your own thread and select it in the debugger (the ~ command allows you to associate thread IDs with the thread numbers used by the debugger, use ~ *** threadnumber *** s to select the thread and kbnto display its stack. If a process is deadlocked, it will most likely perform some kind of blocking operation, such as looking for calls to RtlEnterCriticalSection or WaitForSingleObject, etc. In deadlocked situations, these calls usually allow you to identify another resource that is waiting. Add this information to the waiting timeline and continue until you get back to where you started.

If your wait graph crosses process boundaries, you may find that you need to find who owns the kernel object in another process (which is why I am debugging live if I can). The Sysinternals Process Explorer tool is useful for this purpose.

Once you have identified the impasse participants, you need to place your thinking cap to figure out where to go next. This could mean changing the order in which resources are collected (as someone pointed out), but there really isn't a generic method that would need more information about the application design to figure out how to remove the circular dependency in the wait graph.

There are situations where the loop might not be causing the problem, for example your system might be waiting for a user to log in that never appears (picks up anyone who saw a MessageBox call for a process running as a service).

Of course, there is more to it than that, but I hope this can lead you in the right direction.

+2


source


Some platforms support pstack .

+1


source


We can use below GDB commands to debug deadlock

  • Attach a running process that is in a hung / deadlock state using the command below

    gdb -p <PID>

  • Once you connect to this process, you can see all LWPs using below command

(gdb) info threads

Id   Target Id         Frame 

16   Thread 0xfff06111f0 (LWP 2791) "abc.d" 0x000000fff0f0104c in select () from /lib64/libc.so.6

15   Thread 0xffefdf01f0 (LWP 2792) "abc.d" 0x000000fff0f0104c in select () from /lib64/libc.so.6

14   Thread 0xffef5bb1f0 (LWP 2793) "abc.d" 0x000000fff26feb4c in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

13   Thread 0xffeed351f0 (LWP 2794) "abc.d" 0x000000fff2703924 in nanosleep () from /lib64/libpthread.so.0

12   Thread 0xffee5351f0 (LWP 2795) "abc.d" 0x000000fff26fe76c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

11   Thread 0xffec8a71f0 (LWP 2796) "abc.d" 0x000000fff26fe76c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

10   Thread 0xffd7cd11f0 (LWP 2797) "abc.d" 0x000000fff0f0104c in select () from /lib64/libc.so.6

9    Thread 0xffd74d11f0 (LWP 2798) "abc.d" 0x000000fff0f0104c in select () from /lib64/libc.so.6

8    Thread 0xffd6cd11f0 (LWP 2801) "abc.d" 0x000000fff27022f4 in __lll_lock_wait () from /lib64/libpthread.so.0

7    Thread 0xffd64d11f0 (LWP 2802) "abc.d" 0x000000fff0f0104c in select () from /lib64/libc.so.6

6    Thread 0xffd5cd11f0 (LWP 2803) "abc.d" 0x000000fff0f0104c in select () from /lib64/libc.so.6

5    Thread 0xffd54d11f0 (LWP 2804) "abc.d" 0x000000fff0f0104c in select () from /lib64/libc.so.6

4    Thread 0xffd4cd11f0 (LWP 2805) "abc.d" 0x000000fff0f0104c in select () from /lib64/libc.so.6

3    Thread 0xffc7fff1f0 (LWP 2928) "abc.d" 0x000000fff0f0104c in select () from /lib64/libc.so.6

2    Thread 0xffc77ff1f0 (LWP 2929) "abc.d" 0x000000fff0f0104c in select () from /lib64/libc.so.6

1    Thread 0xfff0a62000 (LWP 2744 for) "abc.d" 0x000000fff0f19b9c in __lll_lock_wait_private () from /lib64/libc.so.6

      

  • We can see thread 1 and thread 8 in idle state, we can go to each thread as shown below

    (gdb) thread 1

    (gdb) bt

The output of the above command will be as shown below:

(gdb) thread 1 

[Switching to thread 1 (Thread 0xfff0a62000 (LWP 2744))]

0  0x000000fff0f19b9c in __lll_lock_wait_private () from /lib64/libc.so.6

(gdb) bt 

0  0x000000fff0f19b9c in __lll_lock_wait_private () from
/lib64/libc.so.6

1  0x000000fff0ea3238 in malloc () from /lib64/libc.so.6

2  0x000000fff115df0c in operator new(unsigned long) ()    from
/lib64/libstdc++.so.6

3  0x000000fff11ceddc in std::string::_Rep::_S_create(unsigned long,
unsigned long, std::allocator<char> const&) () from
/lib64/libstdc++.so.6

4  0x000000fff11d165c in char* std::string::_S_construct<char
const*>(char const*, char const*, std::allocator<char> const&,
std::forward_iterator_tag) ()    from /lib64/libstdc++.so.6

5  0x000000fff11d1760 in std::basic_string<char,
std::char_traits<char>, std::allocator<char> >::basic_string(char
const*, std::allocator<char> const&) ()    from /lib64/libstdc++.so.6

6  0x000000fff1eeac1c in getTime() ()    from
/usr/sbin/dir/sharedobj/liblibLite.so

7  0x000000fff1eeb18c in Logging::logBegin() ()    from
/usr/sbin/dir/sharedobj/liblibLite.so

8  0x000000fff1f324f8 in sigsegv_handler(int, siginfo_t*, void*) ()   
from /usr/sbin/dir/sharedobj/liblibLite.so

9  signal handler called

10 0x000000fff0e9f530 in malloc_consolidate () from /lib64/libc.so.6

11 0x000000fff0ea0160 in _int_free () from /lib64/libc.so.6

12 0x000000fff115b184 in operator delete(void*) () from
 /lib64/libstdc++.so.6

13 0x000000fff115b1f4 in operator delete[](void*) ()    from
 /lib64/libstdc++.so.6

14 0x000000fff20cfd60 in pstream::~pstream() ()    from
 /usr/sbin/dir/sharedobj/libconnV2.so

15 0x000000fff208ffd8 in ifaceSocket::dispatchMsg(pstream&) ()    from
 /usr/sbin/dir/sharedobj/libsockIf.so

16 0x000000fff207d5a4 in
 socketInterface::socket_callback(ConnectionEvent, char*, int) () from
 /usr/sbin/dir/sharedobj/libsockIf.so

17 0x000000fff208f43c in ifaceSocket::Callback(ConnectionEvent, char*,
 int)
 () from /usr/sbin/dir/sharedobj/libsockIf.so

18 0x000000fff20c4674 in ConnectionOS::ProcessReadEvent() ()    from
 /usr/sbin/dir/sharedobj/libconnV2.so

19 0x000000fff20cc808 in ConnectionOSManager::ProcessConns(fd_set*,
 fd_set*)
 () from /usr/sbin/dir/sharedobj/libconnV2.so

20 0x000000fff20cf3bc in SocketsManager::ProcessFds(bool) ()    from 
/usr/sbin/dir/sharedobj/libconnV2.so

21 0x000000fff1e54aa8 in EventReactorBase::IO() ()    from 
 /usr/sbin/dir/sharedobj/libthreadlib.so

22 0x000000fff1e5406c in EventReactorBase::React() ()    from 
/usr/sbin/dir/sharedobj/libthreadlib.so

23 0x000000fff1e50508 in Task::Run() ()    from 
/usr/sbin/dir/sharedobj/libthreadlib.so

24 0x000000fff1e50584 in startTask(void*) ()    from 
/usr/sbin/dir/sharedobj/libthreadlib.so

25 0x00000000104a421c in TaskMgr::Start() ()

26 0x00000000100ddddc in main ()

      

  • We can check the pthread_mutex_t structure and get information about the owner this thread is waiting for.
 (gdb) info reg
 From r8 field get the very first address    

 (gdb) print *((int*)(0x0000000019ff3d30))
 $1 = 2 // Locks    
 (gdb) print *((int*)(0x0000000019ff3d30)+1)
 $2 = 0 // Count        
 (gdb) print *((int*)(0x0000000019ff3d30)+2)
 $3 = 2744 // Owner PID

      

+1


source


I didn't do that, but I think you can use gdb to create the core of your application when it hangs.

You can try debugging this kernel using gdb itself and see for yourself which threads are blocked where?

The above is possible on Linux platforms. Not sure if cygwin on windows can be used for the same purpose.

0


source


Of course, strategically placed statements cout

(or other output alternatives) are always an option, but often far from ideal.

If compiled with g ++, compile with -g

and use gdb. You can connect to a running process (and source code) or just run the program in the debugger to get started. Then look at the stack.

On Windows, just pause your program and look at the stack.

0


source


You can use gdb on Linux Systems to view the status of threads

0


source


  • Find out critical sections belonging to stuck thread # 1
  • Find out critical sections belonging to stuck thread # 2
  • Determine the correct order of acquisition of the critical section
-1


source







All Articles