How to parse a log line in Java delimited by keywords?

I am working on a log parser that should parse a string like this:

ID1 : 0     ID2 : 214 TYPE : ERROR      DATE : 2012-01-11 14:08:07.432 CLASS : Maintenance    SUBCLASS : Operations

      

ID1, ID2, TYPE, DATE, CLASS and SUBCLASS are all keywords and I want to have something like this:

ID1 : 0  
ID2 : 214  
TYPE : ERROR  
DATE : 2012-01-11 14:08:07.432  
CLASS : Maintenance  
SUBCLASS : Operations

      

I am really new to regex and I have the following:

(ID1|ID2|TYPE|DATE|CLASS|SUBCLASS)\\s*:\\s*(.+?)\\s*[(ID1|ID2|TYPE|DATE|CLASS|SUBCLASS)]

      

Of course it doesn't work.

Any advice would be much appreciated.

+3


source to share


5 answers


The main problem with the expression is the square brackets, they create a character class that matches one character internally.

(ID1|ID2|TYPE|DATE|CLASS|SUBCLASS)\\s*:\\s*(.+?)\\s*[(ID1|ID2|TYPE|DATE|CLASS|SUBCLASS)]
                                                    ^                                  ^

      

I have alternated at the end also a positive lookahead statement (group starting with ?=

), so this does not match, just ensured that one of these alternatives is in front. I also added the end of the line $

to the alternation.



(ID1|ID2|TYPE|DATE|CLASS|SUBCLASS)\\s*:\\s*(.+?)\\s*(?=ID1|ID2|TYPE|DATE|CLASS|SUBCLASS|$)

      

Have a look here at Regexr for a good regex validator!

+3


source


You can try this:

        String s = "ID1 : 0     ID2 : 214 TYPE : ERROR      DATE : 2012-01-11 14:08:07.432 CLASS : Maintenance    SUBCLASS : Operations";  
        Pattern pattern = Pattern.compile("(ID1 :\\s+\\d+|ID2 :\\s+\\d+|TYPE :\\s+\\w+|DATE :\\s+\\d{4}-\\d{2}-\\d{2}\\s\\d{2}:\\d{2}:\\d{2}\\.\\d{3}|CLASS :\\s+\\w+|SUBCLASS :\\s+\\w+)");  
        Matcher matcher = pattern.matcher(s); 
        String res="";
        while(matcher.find()){
            res+=matcher.group(0)+System.getProperty("line.separator");
        }
        System.out.println(res);

      

I am assuming that ID and ID2 are only numbers and TYPE, CLASS, SUBCLASS are words.



Output

ID1: 0

ID2: 214

TYPE: ERROR

DATE: 2012-01-11 14: 08: 07.432

CLASS: Maintenance

SUBCLASS: Operations

+1


source


StringBuffer s = new StringBuffer("ID1 : 0     ID2 : 214 TYPE : ERROR      DATE : 2012-01-11 14:08:07.432 CLASS : Maintenance    SUBCLASS : Operations");
int i = s.indexOf("ID2");
s.insert(i, "\n");
i = s.indexOf("TYPE");
s.insert(i, "\n");
    //............The rest code for other keywords

      

NOTE. This is a temporary solution that I know there might be more efficient logic.

0


source


Perhaps you can use a regular expression like this: "(\ w *) \ s \: \ s ([\ w \. \ - \,]) \ s" and use the responder pattern like this:

 Pattern p = Pattern.compile("(\\w*)\\s\\:\\s([\\w\\.\\-\\,]*)\\s*");
 Matcher matcher = pattern.matcher(s); 

 while(matcher.find()){
     //your couple "properties + : + value"
     System.out.println( matcher.group(0) );
     //your properties
     System.out.println( matcher.group(1) );
     //your value
     System.out.println( matcher.group(2) );
 }

      

0


source


public static String format(String line) {
    return
    line.replaceFirst("ID2", "\nID2")
    .replaceFirst("ID1", "\nID1")
    .replaceFirst("TYPE", "\nTYPE")
    .replaceFirst("DATE", "\nDATE")
    .replaceFirst("CLASS", "\nCLASS")
    .replaceFirst("SUBCLASS", "\nSUBCLASS");
}

      

0


source







All Articles