Efficiently parse huge line response

I have a service that returns data in the following format. I've shortened it for understanding, but overall it's a pretty big answer. The format will always be the same.

process=true
version=2
DataCenter=dc2
    Total:2
    prime:{0=1, 1=2, 2=3, 3=4, 4=1, 5=2}
    obvious:{0=6, 1=7, 2=8, 3=5, 4=6}
    mapping:{3=machineA.dc2.com, 2=machineB.dc2.com}
    Machine:[machineA.dc2.com, machineB.dc2.com]
DataCenter=dc1
    Total:2
    prime:{0=1, 1=2, 2=3, 3=4, 4=1, 5=2, 6=3}
    obvious:{0=6, 1=7, 2=8, 3=5, 4=6, 5=7}
    mapping:{3=machineP.dc1.com, 2=machineQ.dc1.com}
    Machine:[machineP.dc1.com, machineQ.dc1.com]
DataCenter=dc3
    Total:2
    prime:{0=1, 1=2, 2=3, 3=4, 4=1, 5=2}
    obvious:{0=6, 1=7, 2=8, 3=5, 4=6}
    mapping:{3=machineO.dc3.com, 2=machineR.dc3.com}
    Machine:[machineO.dc3.com, machineR.dc3.com]

      

I am trying to parse the above data and store it in three different Maps.

  • Main card: Map<String, Map<Integer, Integer>> prime = new HashMap<String, Map<Integer, Integer>>();

  • Obvious map: Map<String, Map<Integer, Integer>> obvious = new HashMap<String, Map<Integer, Integer>>();

  • Map display: Map<String, Map<Integer, String>> mapping = new HashMap<String, Map<Integer, String>>();

Below is the description:

  • In the main map, the key will be dc2

    , and the value will be {0=1, 1=2, 2=3, 3=4, 4=1, 5=2}

    .
  • In the Obvious Map, the key will be dc2

    , and the value will be {0=6, 1=7, 2=8, 3=5, 4=6}

    .
  • In a cartographic map, the key will be dc2

    , and the value will be {3=machineA.dc2.com, 2=machineB.dc2.com}

    .

Likewise for other datacenters.

What is the best way to parse the above line answer? Should I be using regex here or simple string parsing?

public class DataParser {
    public static void main(String[] args) {
        String response = getDataFromURL();
        // here response will contain above string
        parseResponse(response);            
    }

    private void parseResponse(final String response) {
        // what is the best way to parse the response?
    }   
}

      

Any example would be very helpful.

+3


source to share


4 answers


You can do as ShellFish recommends and split the response with "\ n" and then process each line.

One regex approach would be similar to the following (it's incomplete, but enough to get you started):

public static void main(String[] args) throws Exception {
    String response = "process=true\n" +
        "version=2\n" +
        "DataCenter=dc2\n" +
        "    Total:2\n" +
        "    prime:{0=1, 1=2, 2=3, 3=4, 4=1, 5=2}\n" +
        "    obvious:{0=6, 1=7, 2=8, 3=5, 4=6}\n" +
        "    mapping:{3=machineA.dc2.com, 2=machineB.dc2.com}\n" +
        "    Machine:[machineA.dc2.com, machineB.dc2.com]\n" +
        "DataCenter=dc1\n" +
        "    Total:2\n" +
        "    prime:{0=1, 1=2, 2=3, 3=4, 4=1, 5=2, 6=3}\n" +
        "    obvious:{0=6, 1=7, 2=8, 3=5, 4=6, 5=7}\n" +
        "    mapping:{3=machineP.dc1.com, 2=machineQ.dc1.com}\n" +
        "    Machine:[machineP.dc1.com, machineQ.dc1.com]\n" +
        "DataCenter=dc3\n" +
        "    Total:2\n" +
        "    prime:{0=1, 1=2, 2=3, 3=4, 4=1, 5=2}\n" +
        "    obvious:{0=6, 1=7, 2=8, 3=5, 4=6}\n" +
        "    mapping:{3=machineO.dc3.com, 2=machineR.dc3.com}\n" +
        "    Machine:[machineO.dc3.com, machineR.dc3.com]";

    Map<String, Map<Integer, Integer>> prime = new HashMap();
    Map<String, Map<Integer, Integer>> obvious = new HashMap();
    Map<String, Map<Integer, String>> mapping = new HashMap();

    String outerMapKey = "";
    int findCount = 0;
    Matcher matcher = Pattern.compile("(?<=DataCenter=)(.*)|(?<=prime:)(.*)|(?<=obvious:)(.*)|(?<=mapping:)(.*)").matcher(response);
    while(matcher.find()) {
        switch (findCount) {
            case 0:
                outerMapKey = matcher.group();
                break;
            case 1:
                prime.put(outerMapKey, new HashMap());
                String group = matcher.group().replaceAll("[\\{\\}]", "").replaceAll(", ", ",");
                String[] groupPieces = group.split(",");
                for (String groupPiece : groupPieces) {
                    String[] keyValue = groupPiece.split("=");
                    prime.get(outerMapKey).put(Integer.parseInt(keyValue[0]), Integer.parseInt(keyValue[0]));
                }
                break;
            // Add additional cases for obvious and mapping
        }

        findCount++;
        if (findCount == 4) {
            findCount = 0;
        }
    }

    System.out.println("Primes:");
    prime.keySet().stream().forEach(k -> System.out.printf("Key: %s Value: %s\n", k, prime.get(k)));
    // Add additional outputs for obvious and mapping
}

      

Results:



Primes:
Key: dc2 Value: {0=0, 1=1, 2=2, 3=3, 4=4, 5=5}
Key: dc1 Value: {0=0, 1=1, 2=2, 3=3, 4=4, 5=5, 6=6}
Key: dc3 Value: {0=0, 1=1, 2=2, 3=3, 4=4, 5=5}

      

Links to explain the regex pattern: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

http://www.regular-expressions.info/lookaround.html

+1


source


The answer depends on how much you trust the fixed and precise format. A very simple approach parses the string and does minimal string comparison to determine the key value:

private static final String DATA_CENTER = "DataCenter=";
private static final int DATA_CENTER_LEN = DATA_CENTER.length();
private static final String PRIME = "    prime:";
private static final int PRIME_LEN = PRIME.length();
// etc.
Map<String, Map<Integer, Integer>> prime = new HashMap<>();
// etc.
String response = "...";
Scanner scanner = new Scanner( response );
while(scanner.hasNextLine()){
    String line = scanner.nextLine();
    if( line.startsWith( DATA_CENTER ) ){
        String dc = line.substring( DATA_CENTER_LEN );
        line = scanner.nextLine(); // skip Total 
        prime.put( dc, str2map(scanner.nextLine().substring(PRIME_LEN)) );
        obvious.put( dc, str2map(scanner.nextLine().substring(OBVIOUS_LEN)) );
        mapping.put( dc, str2mapis(scanner.nextLine().substring(MAPPING_LEN)) );
    }
}

      

More explicit calls to nextLine () could have avoided even a test for the "DataCenter".

Here are some pretty much the same methods for splitting curly braces and creating a map:

private static Map<Integer,Integer> str2map( String str ){
    Map<Integer,Integer> map = new HashMap<>();
    str = str.substring( 1, str.length()-1 );
    String[] pairs = str.split( ", " );
    for( String pair: pairs ){
        String[] kv = pair.split( "=" );
        map.put( Integer.parseInt(kv[0]),Integer.parseInt(kv[1]) );
    }
    return map;
}

private static Map<Integer,String> str2mapis( String str ){
    Map<Integer,String> map = new HashMap<>();
    //...
        map.put( Integer.parseInt(kv[0]),kv[1] );
    }
    return map;
}

      



If there is a chance that the space may be different, you can stay safe using

private static final String PRIME = "prime:";
// ...
prime.put( dc, str2map(scanner.nextLine().trim().substring( PRIME_LEN )) );

      

If the consistency or completeness of the lines is not guaranteed, testing may be required:

line = scanner.nextLine().trim();
if( line.startsWith( PRIME ) ){
     prime.put( dc, str2map(scanner.nextLine().substring( PRIME_LEN )) );
}

      

With even less parsing regex, stability / trust can be pointed out.

+1


source


I would do a simple parsing of strings in this case by applying for each line. There is something like this in pseudocode:

for line in response
    if line matches /^DataCenter/
         key = datacenter name
    else if line matches / *prime/
         prime.put(key, prime value)
    else if line matches / *obvious/
         obvious.put(key, obvious value)
    else if line matches / *mapping/
         mapping.put(key, mapping value)
    else
         getline

      

You can optimize here by checking the first char of the string first. If it is something other than a space or D

, you can go to the next line. If the format is always the same, you can even hard-code the strings for parsing. In the given example, you can do:

skip 2 lines
repeat
    extract datacenter name
    skip 1 line
    extract prime
    extract obvious
    extract mapping
    add above stuff to the maps
    skip 1 line
until EOF

      

It will be much faster, but it will work if the format changes.

0


source


You can use a Parser generator like ANTLR, or you can pipe the parser code. Depending on how much output you have to process and how often, you may find that jumping to a problem like this is not really worth it, and that you just go down each line and manually parse it (like a regex or indexOf) enough and clear enough.

0


source







All Articles