How to parse a log line in Java delimited by keywords?
I am working on a log parser that should parse a string like this:
ID1 : 0 ID2 : 214 TYPE : ERROR DATE : 2012-01-11 14:08:07.432 CLASS : Maintenance SUBCLASS : Operations
ID1, ID2, TYPE, DATE, CLASS and SUBCLASS are all keywords and I want to have something like this:
ID1 : 0
ID2 : 214
TYPE : ERROR
DATE : 2012-01-11 14:08:07.432
CLASS : Maintenance
SUBCLASS : Operations
I am really new to regex and I have the following:
(ID1|ID2|TYPE|DATE|CLASS|SUBCLASS)\\s*:\\s*(.+?)\\s*[(ID1|ID2|TYPE|DATE|CLASS|SUBCLASS)]
Of course it doesn't work.
Any advice would be much appreciated.
source to share
The main problem with the expression is the square brackets, they create a character class that matches one character internally.
(ID1|ID2|TYPE|DATE|CLASS|SUBCLASS)\\s*:\\s*(.+?)\\s*[(ID1|ID2|TYPE|DATE|CLASS|SUBCLASS)]
^ ^
I have alternated at the end also a positive lookahead statement (group starting with ?=
), so this does not match, just ensured that one of these alternatives is in front. I also added the end of the line $
to the alternation.
(ID1|ID2|TYPE|DATE|CLASS|SUBCLASS)\\s*:\\s*(.+?)\\s*(?=ID1|ID2|TYPE|DATE|CLASS|SUBCLASS|$)
Have a look here at Regexr for a good regex validator!
source to share
You can try this:
String s = "ID1 : 0 ID2 : 214 TYPE : ERROR DATE : 2012-01-11 14:08:07.432 CLASS : Maintenance SUBCLASS : Operations";
Pattern pattern = Pattern.compile("(ID1 :\\s+\\d+|ID2 :\\s+\\d+|TYPE :\\s+\\w+|DATE :\\s+\\d{4}-\\d{2}-\\d{2}\\s\\d{2}:\\d{2}:\\d{2}\\.\\d{3}|CLASS :\\s+\\w+|SUBCLASS :\\s+\\w+)");
Matcher matcher = pattern.matcher(s);
String res="";
while(matcher.find()){
res+=matcher.group(0)+System.getProperty("line.separator");
}
System.out.println(res);
I am assuming that ID and ID2 are only numbers and TYPE, CLASS, SUBCLASS are words.
Output
ID1: 0
ID2: 214
TYPE: ERROR
DATE: 2012-01-11 14: 08: 07.432
CLASS: Maintenance
SUBCLASS: Operations
source to share
StringBuffer s = new StringBuffer("ID1 : 0 ID2 : 214 TYPE : ERROR DATE : 2012-01-11 14:08:07.432 CLASS : Maintenance SUBCLASS : Operations");
int i = s.indexOf("ID2");
s.insert(i, "\n");
i = s.indexOf("TYPE");
s.insert(i, "\n");
//............The rest code for other keywords
NOTE. This is a temporary solution that I know there might be more efficient logic.
source to share
Perhaps you can use a regular expression like this: "(\ w *) \ s \: \ s ([\ w \. \ - \,]) \ s" and use the responder pattern like this:
Pattern p = Pattern.compile("(\\w*)\\s\\:\\s([\\w\\.\\-\\,]*)\\s*");
Matcher matcher = pattern.matcher(s);
while(matcher.find()){
//your couple "properties + : + value"
System.out.println( matcher.group(0) );
//your properties
System.out.println( matcher.group(1) );
//your value
System.out.println( matcher.group(2) );
}
source to share