Find which stream is causing too many problems with open files and why duplicate node ids in lsof output

Our Java application is throwing "Too many open files" issue after a long startup. After debugging the problem, it was seen that there were so many fds open that the output from lsof.

# lsof -p pid | grep "pipe" | wc -l

      

698962

# lsof -p pid | grep "anon_inode" | wc -l

      

349481

-------------- Some lsof data -----------

COMMAND   PID  USER   FD  TYPE             DEVICE SIZE/OFF       NODE NAME
java    23994  app 464u  0000                0,9        0       3042 anon_inode
java    23994  app 465u  0000                0,9        0       3042 anon_inode
java    23994  app 466r  FIFO                0,8      0t0  962495977 pipe
java    23994  app 467w  FIFO                0,8      0t0  962495977 pipe
java    23994  app 468r  FIFO                0,8      0t0  963589016 pipe
java    23994  app 469w  FIFO                0,8      0t0  963589016 pipe
java    23994  app 470u  0000                0,9        0       3042 anon_inode
java    23994  app 471u  0000                0,9        0       3042 anon_inode

      

How do I find the root cause for many open FD types FIFO and 0000. There are not many file read / write in our application. There are so many TCP messages read from the stream using the apache mina framework that Nio uses internally.

These are my questions

  • We checked / proc / pid / task / folder. There are many folders. Does this match thread IDs? But according to jstack there are 141 threads where as this folder has 209 subfolders.
  • How do I find which thread is causing the fd leak? In our case, most of the folder in the task folder corresponds to many fds. i.e. / proc / pid / task / threadid / fd has many fd entries
  • What are the possible reasons for pipe and anon_inodes in lsof
  • What type FD0000 means
  • All anon_inode with the same node id 3042. What's the point of this?
+3


source to share


2 answers


Chances are you are opening resources and then not closing them properly. Make sure you use appropriate methods like try-with-resources or try-finally to tidy up.



To find the problem, you have to route all your IOs through the class and then keep track of open and closed, perhaps even remembering the stack trace. Then you can query this and see where the resources are leaking.

+1


source


We found the problem. There was a code stream where org.apache.mina.transport.socket.nio.NioSocketConnector was created but not closed under some conditions. To find the problem we did below



  • We have included strace in our Linux server
  • We spent a couple of minutes on this process.
  • We could identify the thread ID causing the problem.
  • From jstack we discovered the thread class.
0


source







All Articles