Separate with a comma that is not in parentheses by omitting anything inside them
I know this might be a different topic about regex, but even though I searched for it, I couldn't get a clear answer. So here's my problem: I have a line like this:
{1,2,{3,{4},5},{5,6}}
I remove most of the outer brackets (they are from the input and I don't need them), so now I have this:
1,2,{3,{4},5},{5,6}
And now I need to split this string into an array of elements, treating everything inside those parentheses as one seamless element:
Arr[0] 1 Arr[1] 2 Arr[2] {3,{4},5} Arr[3] {5,6}
I've tried doing this with lookahead, but so far I am failing (sorry). What's the easiest way to handle these things in terms of regex?
source to share
Couldn't find a solution regex
, but no solution here regex
. It involves parsing numbers (not in curly braces) before each comma (except for the last number on a line) and parsing lines (in curly braces) until the group's closing curly brace is found.
If a regex solution is found I would like to see it.
public static void main(String[] args) throws Exception {
String data = "1,2,{3,{4},5},{5,6},-7,{7,8},{8,{9},10},11";
List<String> list = new ArrayList();
for (int i = 0; i < data.length(); i++) {
if ((Character.isDigit(data.charAt(i))) ||
// Include negative numbers
(data.charAt(i) == '-') && (i + 1 < data.length() && Character.isDigit(data.charAt(i + 1)))) {
// Get the number before the comma, unless it the last number
int commaIndex = data.indexOf(",", i);
String number = commaIndex > -1
? data.substring(i, commaIndex)
: data.substring(i);
list.add(number);
i += number.length();
} else if (data.charAt(i) == '{') {
// Get the group of numbers until you reach the final
// closing curly brace
StringBuilder sb = new StringBuilder();
int openCount = 0;
int closeCount = 0;
do {
if (data.charAt(i) == '{') {
openCount++;
} else if (data.charAt(i) == '}') {
closeCount++;
}
sb.append(data.charAt(i));
i++;
} while (closeCount < openCount);
list.add(sb.toString());
}
}
for (int i = 0; i < list.size(); i++) {
System.out.printf("Arr[%d]: %s\r\n", i, list.get(i));
}
}
Results:
Arr[0]: 1
Arr[1]: 2
Arr[2]: {3,{4},5}
Arr[3]: {5,6}
Arr[4]: -7
Arr[5]: {7,8}
Arr[6]: {8,{9},10}
Arr[7]: 11
source to share
You can not do that if elements such as this should be kept together: {{1},{2}}
. The reason is because regex , as it is equivalent to parsing a balanced parenthesis language. This language has no context and cannot be parsed using a regular expression. The best way to deal with this is not to use a regex, but to use a for loop with a stack (the stack makes it possible to parse context-free languages). In pseudocode, we could do:
for char in input
if stack is empty and char is ','
add substring(last, current position) to output array
last = current index
if char is '{'
push '{' on stack
if char is '}'
pop from stack
This pseudo code will build the array at will, note that it is best to iterate over the character indices in a given string, since you will need those that define the boundaries of the substrings added to the array.
source to share
Almost close to the requirement. Not enough time. Rest will end later (one comma is incorrect).
Regex: ,(?=[^}]*(?:{|$))
To check if a regular expression is correct: go to http://regexr.com/
To implement this pattern in Java, there is a slight difference. \ Must be added before {and}.
Hence the regex for Java input: ,(?=[^\\}]*(?:\\{|$))
String numbers = {1,2,{3,{4},5},{5,6}};
numbers = numbers.substring(1, numbers.length()-1);
String[] separatedValues = numbers.split(",(?=[^\\}]*(?:\\{|$))");
System.out.println(separatedValues[0]);
source to share