How to parse logs written by multiple threads?

I have an interesting problem and will be grateful for your thoughts for a better solution. I need to parse a set of magazines. The logs are created with a multi-threaded program and one process loop creates multiple lines of logs.

When analyzing these logs, I need to pull out certain pieces of information from each process - of course, this information goes through several lines (I want to compress this data into one line). Due to the application being multithreaded, the block of lines owned by the process can be fragmented like other processes written to the same log file at the same time.

Fortunately, each line gives a process ID, so I can tell which logs belong to which process.

There are now several parsers that extend the same class, but are designed to read logs from a single thread (without fragmentation - from the original system) and use the readLine () method in the superclass. These parsers will keep reading lines until all of the regular expressions have been matched for a block of lines (i.e., lines written in one process cycle).

So, what can I do with the super class so that it can manage fragmented logs and ensure that changes to existing implemented parsers are minimal?

0


source to share


5 answers


It looks like there are already existing parser classes that you want to use. In this case, I would write a decorator for the parser that removes lines unrelated to the process you control.

It looks like your classes might look like this:

abstract class Parser {
    public abstract void parse( ... );
    protected String readLine() { ... }
}

class SpecialPurposeParser extends Parser {
    public void parse( ... ) { 
        // ... special stuff
        readLine();
        // ... more stuff
    }
}

      

And I would write something like:



class SingleProcessReadingDecorator extends Parser {
    private Parser parser;
    private String processId;
    public SingleProcessReadingDecorator( Parser parser, String processId ) {
        this.parser = parser;
        this.processId = processId;
    }

    public void parse( ... ) { parser.parse( ... ); }

    public String readLine() {
        String text = super.readLine();
        if( /*text is for processId */ ) { 
            return text; 
        }
        else {
            //keep readLine'ing until you find the next line and then return it
            return this.readLine();
        }
    }

      

Then any event you want to change will be used like this:

//old way
Parser parser = new SpecialPurposeParser();
//changes to
Parser parser = new SingleProcessReadingDecorator( new SpecialPurposeParser(), "process1234" );

      

This piece of code is simple and incomplete, but it gives you an idea of ​​how the decorator pattern might work here.

+2


source


I would write a simple distributor that reads the log file line by line and stores them in different VirtualLog objects in memory - a virtual log is a kind of virtual file, really just a line or something that existing parsers can use to. Virtual Logs stored on the card with the process identifier (PID) as the key. When you read a line from the log, check if there is a PID. If so, add the line to the appropriate PID virtual tray. If not, create a new VirtualLog object and add it to the map. Parsers work as separate threads, one for each VirtualLog. Each VirtualLog object is destroyed as soon as it is fully parsed.



+1


source


You need to temporarily store the strings in a queue where one thread consumes them and transfers them after each set completes. If you have no way of knowing whether a job is complete or not, either by the number of lines or by the contents of the lines, you might consider using the sliding window method, in which you do not collect individual sets until a certain time has passed.

0


source


Will something like this do? It starts a new thread for each process ID in the log file.

class Parser {
   String currentLine;
   Parser() {
      //Construct parser
   }
   synchronized String readLine(String processID) {
      if (currentLine == null)
         currentLine = readLinefromLog();

      while (currentline != null && ! getProcessIdFromLine(currentLine).equals(processId)
        wait();

      String line = currentLine;
      currentLine = readLinefromLog();
      notify();
      return line;
   }
}

class ProcessParser extends Parser implements Runnable{
   String processId;
   ProcessParser(String processId) {
      super();
      this.processId = processId;
   }

   void startParser() {
       new Thread(this).start();
   }

   public void run() {
      String line = null;
      while ((line = readLine()) != null) {
          // process log line here
      }
   }

   String readLine() {
      String line = super.readLine(processId);
      return line;
   }      

      

0


source


One simple solution might be to read a file line by line and write multiple files, one for each process ID. A list of process ids can be stored in a hash map in memory to determine if a new file is needed or in which file that has already been created will display the strings for a particular process ID. After all (temporary) files have been written, existing parsers can do the job on each of them.

0


source







All Articles