Hung processes are resumed if they are attached to strace

I have a networking program written in C using TCP sockets. Sometimes the client program hangs forever, waiting for input from the server. Specifically, the client hangs on a select () call set to fd to read characters sent by the server.

I am using strace to find out where the process is stuck. However, sometimes, when I attach a hung client process to strace, it immediately resumes execution and exits correctly. Not all hung processes exhibit this behavior, some processes get stuck in select () even if I attach them to strace. But most processes resume execution when they attach to strace.

I'm curious about what causes processes to resume when they are attached to strace. This might give me a hint to find out why the client processes are hanging.

Any ideas? what makes the pendant process resume executing when attached to strace?

Update:

Here's the output of strace on pendant processes.

> sudo strace -p 25645
Process 25645 attached - interrupt to quit
--- SIGSTOP (Stopped (signal)) @ 0 (0) ---
--- SIGSTOP (Stopped (signal)) @ 0 (0) ---
[ Process PID=25645 runs in 32 bit mode. ]
select(6, [3 5], NULL, NULL, NULL)      = 2 (in [3 5])
read(5, "\0", 8192)                     = 1
write(2, "", 0)                         = 0
read(3, "====Setup set_oldtempbehaio"..., 8192) = 555
write(1, "====Setup set_oldtempbehaio"..., 555) = 555
select(6, [3 5], NULL, NULL, NULL)      = 2 (in [3 5])
read(5, "", 8192)                       = 0
read(3, "", 8192)                       = 0
close(5)                                = 0
kill(25652, SIGKILL)                    = 0
exit_group(0)                           = ?
Process 25645 detached

      

_

> sudo strace -p 14462
Process 14462 attached - interrupt to quit
[ Process PID=14462 runs in 32 bit mode. ]
read(0, 0xff85fdbc, 8192)               = -1 EIO (Input/output error)
shutdown(3, 1 /* send */)               = 0
exit_group(0)                           = ?

      

_

> sudo strace -p 7517
Process 7517 attached - interrupt to quit
--- SIGSTOP (Stopped (signal)) @ 0 (0) ---
--- SIGSTOP (Stopped (signal)) @ 0 (0) ---
[ Process PID=7517 runs in 32 bit mode. ]
connect(3, {sa_family=AF_INET, sin_port=htons(300), sin_addr=inet_addr("100.64.220.98")}, 16) = -1 ETIMEDOUT (Connection timed out)
close(3)                                = 0
dup(2)                                  = 3
fcntl64(3, F_GETFL)                     = 0x1 (flags O_WRONLY)
close(3)                                = 0
write(2, "dsd13: Connection timed out\n", 30) = 30
write(2, "Error code : 110\n", 17)      = 17
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
exit_group(1)                           = ?
Process 7517 detached

      

Not just select (), but processes (of the same program) get stuck in various system calls before I attach them to strace. They suddenly resume after being attached to strace. If I don't attach them to strace, they just hang there forever.

Update 2:

I found out that strace can start a process that was previously stopped (process in Tate). Now I'm trying to understand why these processes went into the "T" state, what is the reason. Here / proc // status information:

> cat /proc/12554/status
Name:   someone
State:  T (stopped)
SleepAVG:       88%
Tgid:   12554
Pid:    12554
PPid:   9754
TracerPid:      0
Uid:    5000    5000    5000    5000
Gid:    48986   48986   48986   48986
FDSize: 256
Groups: 9149 48986
VmPeak:     1992 kB
VmSize:     1964 kB
VmLck:         0 kB
VmHWM:       608 kB
VmRSS:       608 kB
VmData:      156 kB
VmStk:        20 kB
VmExe:        16 kB
VmLib:      1744 kB
VmPTE:        20 kB
Threads:        1
SigQ:   54/73728
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000006
SigCgt: 0000000000004000
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
Cpus_allowed:   00000000,00000000,00000000,0000000f
Mems_allowed:   00000000,00000001

      

0


source to share


1 answer


strace

uses ptrace

. The ptrace man page has the following:

Since attaching sends SIGSTOP and the tracer usually suppresses it,
this may cause a stray EINTR return from the currently executing system
call in the tracee, as described in the "Signal injection and
suppression" section.

      



Do you see select

return EINTR

?

+2


source







All Articles