Efficiently parse huge line response
I have a service that returns data in the following format. I've shortened it for understanding, but overall it's a pretty big answer. The format will always be the same.
process=true
version=2
DataCenter=dc2
Total:2
prime:{0=1, 1=2, 2=3, 3=4, 4=1, 5=2}
obvious:{0=6, 1=7, 2=8, 3=5, 4=6}
mapping:{3=machineA.dc2.com, 2=machineB.dc2.com}
Machine:[machineA.dc2.com, machineB.dc2.com]
DataCenter=dc1
Total:2
prime:{0=1, 1=2, 2=3, 3=4, 4=1, 5=2, 6=3}
obvious:{0=6, 1=7, 2=8, 3=5, 4=6, 5=7}
mapping:{3=machineP.dc1.com, 2=machineQ.dc1.com}
Machine:[machineP.dc1.com, machineQ.dc1.com]
DataCenter=dc3
Total:2
prime:{0=1, 1=2, 2=3, 3=4, 4=1, 5=2}
obvious:{0=6, 1=7, 2=8, 3=5, 4=6}
mapping:{3=machineO.dc3.com, 2=machineR.dc3.com}
Machine:[machineO.dc3.com, machineR.dc3.com]
I am trying to parse the above data and store it in three different Maps.
- Main card:
Map<String, Map<Integer, Integer>> prime = new HashMap<String, Map<Integer, Integer>>();
- Obvious map:
Map<String, Map<Integer, Integer>> obvious = new HashMap<String, Map<Integer, Integer>>();
- Map display:
Map<String, Map<Integer, String>> mapping = new HashMap<String, Map<Integer, String>>();
Below is the description:
- In the main map, the key will be
dc2
, and the value will be{0=1, 1=2, 2=3, 3=4, 4=1, 5=2}
. - In the Obvious Map, the key will be
dc2
, and the value will be{0=6, 1=7, 2=8, 3=5, 4=6}
. - In a cartographic map, the key will be
dc2
, and the value will be{3=machineA.dc2.com, 2=machineB.dc2.com}
.
Likewise for other datacenters.
What is the best way to parse the above line answer? Should I be using regex here or simple string parsing?
public class DataParser {
public static void main(String[] args) {
String response = getDataFromURL();
// here response will contain above string
parseResponse(response);
}
private void parseResponse(final String response) {
// what is the best way to parse the response?
}
}
Any example would be very helpful.
source to share
You can do as ShellFish recommends and split the response with "\ n" and then process each line.
One regex approach would be similar to the following (it's incomplete, but enough to get you started):
public static void main(String[] args) throws Exception {
String response = "process=true\n" +
"version=2\n" +
"DataCenter=dc2\n" +
" Total:2\n" +
" prime:{0=1, 1=2, 2=3, 3=4, 4=1, 5=2}\n" +
" obvious:{0=6, 1=7, 2=8, 3=5, 4=6}\n" +
" mapping:{3=machineA.dc2.com, 2=machineB.dc2.com}\n" +
" Machine:[machineA.dc2.com, machineB.dc2.com]\n" +
"DataCenter=dc1\n" +
" Total:2\n" +
" prime:{0=1, 1=2, 2=3, 3=4, 4=1, 5=2, 6=3}\n" +
" obvious:{0=6, 1=7, 2=8, 3=5, 4=6, 5=7}\n" +
" mapping:{3=machineP.dc1.com, 2=machineQ.dc1.com}\n" +
" Machine:[machineP.dc1.com, machineQ.dc1.com]\n" +
"DataCenter=dc3\n" +
" Total:2\n" +
" prime:{0=1, 1=2, 2=3, 3=4, 4=1, 5=2}\n" +
" obvious:{0=6, 1=7, 2=8, 3=5, 4=6}\n" +
" mapping:{3=machineO.dc3.com, 2=machineR.dc3.com}\n" +
" Machine:[machineO.dc3.com, machineR.dc3.com]";
Map<String, Map<Integer, Integer>> prime = new HashMap();
Map<String, Map<Integer, Integer>> obvious = new HashMap();
Map<String, Map<Integer, String>> mapping = new HashMap();
String outerMapKey = "";
int findCount = 0;
Matcher matcher = Pattern.compile("(?<=DataCenter=)(.*)|(?<=prime:)(.*)|(?<=obvious:)(.*)|(?<=mapping:)(.*)").matcher(response);
while(matcher.find()) {
switch (findCount) {
case 0:
outerMapKey = matcher.group();
break;
case 1:
prime.put(outerMapKey, new HashMap());
String group = matcher.group().replaceAll("[\\{\\}]", "").replaceAll(", ", ",");
String[] groupPieces = group.split(",");
for (String groupPiece : groupPieces) {
String[] keyValue = groupPiece.split("=");
prime.get(outerMapKey).put(Integer.parseInt(keyValue[0]), Integer.parseInt(keyValue[0]));
}
break;
// Add additional cases for obvious and mapping
}
findCount++;
if (findCount == 4) {
findCount = 0;
}
}
System.out.println("Primes:");
prime.keySet().stream().forEach(k -> System.out.printf("Key: %s Value: %s\n", k, prime.get(k)));
// Add additional outputs for obvious and mapping
}
Results:
Primes:
Key: dc2 Value: {0=0, 1=1, 2=2, 3=3, 4=4, 5=5}
Key: dc1 Value: {0=0, 1=1, 2=2, 3=3, 4=4, 5=5, 6=6}
Key: dc3 Value: {0=0, 1=1, 2=2, 3=3, 4=4, 5=5}
Links to explain the regex pattern: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
source to share
The answer depends on how much you trust the fixed and precise format. A very simple approach parses the string and does minimal string comparison to determine the key value:
private static final String DATA_CENTER = "DataCenter=";
private static final int DATA_CENTER_LEN = DATA_CENTER.length();
private static final String PRIME = " prime:";
private static final int PRIME_LEN = PRIME.length();
// etc.
Map<String, Map<Integer, Integer>> prime = new HashMap<>();
// etc.
String response = "...";
Scanner scanner = new Scanner( response );
while(scanner.hasNextLine()){
String line = scanner.nextLine();
if( line.startsWith( DATA_CENTER ) ){
String dc = line.substring( DATA_CENTER_LEN );
line = scanner.nextLine(); // skip Total
prime.put( dc, str2map(scanner.nextLine().substring(PRIME_LEN)) );
obvious.put( dc, str2map(scanner.nextLine().substring(OBVIOUS_LEN)) );
mapping.put( dc, str2mapis(scanner.nextLine().substring(MAPPING_LEN)) );
}
}
More explicit calls to nextLine () could have avoided even a test for the "DataCenter".
Here are some pretty much the same methods for splitting curly braces and creating a map:
private static Map<Integer,Integer> str2map( String str ){
Map<Integer,Integer> map = new HashMap<>();
str = str.substring( 1, str.length()-1 );
String[] pairs = str.split( ", " );
for( String pair: pairs ){
String[] kv = pair.split( "=" );
map.put( Integer.parseInt(kv[0]),Integer.parseInt(kv[1]) );
}
return map;
}
private static Map<Integer,String> str2mapis( String str ){
Map<Integer,String> map = new HashMap<>();
//...
map.put( Integer.parseInt(kv[0]),kv[1] );
}
return map;
}
If there is a chance that the space may be different, you can stay safe using
private static final String PRIME = "prime:";
// ...
prime.put( dc, str2map(scanner.nextLine().trim().substring( PRIME_LEN )) );
If the consistency or completeness of the lines is not guaranteed, testing may be required:
line = scanner.nextLine().trim(); if( line.startsWith( PRIME ) ){ prime.put( dc, str2map(scanner.nextLine().substring( PRIME_LEN )) ); }
With even less parsing regex, stability / trust can be pointed out.
source to share
I would do a simple parsing of strings in this case by applying regexfor each line. There is something like this in pseudocode:
for line in response
if line matches /^DataCenter/
key = datacenter name
else if line matches / *prime/
prime.put(key, prime value)
else if line matches / *obvious/
obvious.put(key, obvious value)
else if line matches / *mapping/
mapping.put(key, mapping value)
else
getline
You can optimize here by checking the first char of the string first. If it is something other than a space or D
, you can go to the next line. If the format is always the same, you can even hard-code the strings for parsing. In the given example, you can do:
skip 2 lines
repeat
extract datacenter name
skip 1 line
extract prime
extract obvious
extract mapping
add above stuff to the maps
skip 1 line
until EOF
It will be much faster, but it will work if the format changes.
source to share
You can use a Parser generator like ANTLR, or you can pipe the parser code. Depending on how much output you have to process and how often, you may find that jumping to a problem like this is not really worth it, and that you just go down each line and manually parse it (like a regex or indexOf) enough and clear enough.
source to share