Separate with a comma that is not in parentheses by omitting anything inside them

I know this might be a different topic about regex, but even though I searched for it, I couldn't get a clear answer. So here's my problem: I have a line like this:

{1,2,{3,{4},5},{5,6}}

      

I remove most of the outer brackets (they are from the input and I don't need them), so now I have this:

1,2,{3,{4},5},{5,6}

      

And now I need to split this string into an array of elements, treating everything inside those parentheses as one seamless element:

Arr[0]    1
Arr[1]    2
Arr[2]    {3,{4},5}
Arr[3]    {5,6}

      

I've tried doing this with lookahead, but so far I am failing (sorry). What's the easiest way to handle these things in terms of regex?

+3


source to share


3 answers


Couldn't find a solution regex

, but no solution here regex

. It involves parsing numbers (not in curly braces) before each comma (except for the last number on a line) and parsing lines (in curly braces) until the group's closing curly brace is found.

If a regex solution is found I would like to see it.

public static void main(String[] args) throws Exception {
    String data = "1,2,{3,{4},5},{5,6},-7,{7,8},{8,{9},10},11";
    List<String> list = new ArrayList();
    for (int i = 0; i < data.length(); i++) {
        if ((Character.isDigit(data.charAt(i))) ||
            // Include negative numbers
             (data.charAt(i) == '-') && (i + 1 < data.length() && Character.isDigit(data.charAt(i + 1)))) {
            // Get the number before the comma, unless it the last number
            int commaIndex = data.indexOf(",", i);
            String number = commaIndex > -1
                    ? data.substring(i, commaIndex)
                    : data.substring(i);
            list.add(number);
            i += number.length();
        } else if (data.charAt(i) == '{') {
            // Get the group of numbers until you reach the final 
            // closing curly brace
            StringBuilder sb = new StringBuilder();
            int openCount = 0;
            int closeCount = 0;
            do {
                if (data.charAt(i) == '{') {
                    openCount++;
                } else if (data.charAt(i) == '}') {
                    closeCount++;
                }
                sb.append(data.charAt(i));
                i++;
            } while (closeCount < openCount);
            list.add(sb.toString());
        }
    }

    for (int i = 0; i < list.size(); i++) {
        System.out.printf("Arr[%d]: %s\r\n", i, list.get(i));
    }
}

      



Results:

Arr[0]: 1
Arr[1]: 2
Arr[2]: {3,{4},5}
Arr[3]: {5,6}
Arr[4]: -7
Arr[5]: {7,8}
Arr[6]: {8,{9},10}
Arr[7]: 11

      

0


source


You can not do that if elements such as this should be kept together: {{1},{2}}

. The reason is because regex , as it is equivalent to parsing a balanced parenthesis language. This language has no context and cannot be parsed using a regular expression. The best way to deal with this is not to use a regex, but to use a for loop with a stack (the stack makes it possible to parse context-free languages). In pseudocode, we could do:

for char in input
    if stack is empty and char is ','
        add substring(last, current position) to output array
        last = current index 
    if char is '{'
         push '{' on stack
    if char is '}'
         pop from stack

      



This pseudo code will build the array at will, note that it is best to iterate over the character indices in a given string, since you will need those that define the boundaries of the substrings added to the array.

+3


source


Almost close to the requirement. Not enough time. Rest will end later (one comma is incorrect).
Regex: ,(?=[^}]*(?:{|$))


To check if a regular expression is correct: go to http://regexr.com/

enter image description here

To implement this pattern in Java, there is a slight difference. \ Must be added before {and}.

Hence the regex for Java input: ,(?=[^\\}]*(?:\\{|$))

String numbers = {1,2,{3,{4},5},{5,6}};
numbers = numbers.substring(1, numbers.length()-1);
String[] separatedValues = numbers.split(",(?=[^\\}]*(?:\\{|$))");
System.out.println(separatedValues[0]);

      

+1


source







All Articles