Scala regex - How to match inside curly braces but avoid curly braces

I have text something like this:

text {text10}
text {text1, text9}
anotherText [
{text2, text5}
{text3, text6}
{test4, text8}
]

      

This regex matches everything I want:

val regex =  """(.*?) (\[.*?\]|\{(.*?)\})""".r

      

However, I have a small problem. I don't want to match curly braces. So I got the result as

val line = regex findAllIn configByLines
line.matchData foreach {
  m => println("output: "+m.group(2))
}
#output: {text10}
#output: {text1, text9}
#output: [{text2, text5} {text3, text6} {test4, text8}]

      

But I would like to get the output for group (2) as

#output: text10
#output: text1, text9
#output: {text2, text5} {text3, text6} {text4, text8}

      

How can I fix my regex.

+3


source to share


3 answers


This is very handy, although you might want to make sure you really need to do this with a regex, since the result is not exactly pretty and almost irreconcilable:

val regex =  """[^\{\[]*[\{\[](((?<=\{)[^}]*)|((?<=\[)[^\]]*))[\}\]]""".r

      

The main trick was to use a negative zero-width lookbehind (for example (?<=\{)

, to avoid matching '{').



Corresponding text in group 1.

Mandatory REPL session:

scala> val configByLines = """text {text10}
     | text {text1, text9}
     | anotherText [
     | {text2, text5}
     | {text3, text6}
     | {test4, text8}
     | ]"""
configByLines: String =
text {text10}
text {text1, text9}
anotherText [
{text2, text5}
{text3, text6}
{test4, text8}
]

scala> val regex =  """[^\{\[]*[\{\[](((?<=\{)[^}]*)|((?<=\[)[^\]]*))[\}\]]""".r
regex: scala.util.matching.Regex = [^\{\[]*[\{\[](((?<=\{)[^}]*)|((?<=\[)[^\]]*))[\}\]]

scala> val line = regex findAllIn configByLines.replace("\n", " ")
line: scala.util.matching.Regex.MatchIterator = non-empty iterator

scala> line.matchData foreach {
     |   m => println("output: "+m.group(1))
     | }
output: text10
output: text1, text9
output:  {text2, text5} {text3, text6} {test4, text8}

      

+3


source


You can use bindings \G

if scala supports this functionality.

(?:^(.*?) \[?|(?<!^)\G){?([\w]*)}?

      



DEMO

0


source


Regular expressions are overkill for this; they are used in Perl for this sort of parsing because the regex engine is powerful and brings performance benefits, but in the JVM you really don't benefit from using regex unless you need their power. Therefore, I recommend manual disassembly for this particular example.

Take your string and split it with opening curly braces:

scala> "anotherText [{text2} {text3}]" split '{'
res1: Array[String] = Array(anotherText [, "text2} ", text3}])

      

Drop the first element as it was not preceded by an opening parenthesis:

scala> ("anotherText [{text2} {text3}]" split '{').tail
res2: Array[String] = Array("text2} ", text3}])

      

This will work even if the line starts with an open parenthesis, because split will generate an empty first element.

Now you can handle splitting the array on the closing curly brace and take the part before the brace:

scala> ("anotherText [{text2} {text3}]" split '{').tail map (_.split('}').head)
res3: Array[String] = Array(text2, text3)

      

Note that this is not at all resistant to unbalanced braces, which includes cases where the curly brace line itself contains curly braces. Experiment with my last example against some of these lines. To do this, you need to build a (trivial) parser and decide how you are going to escape or otherwise encode the inline curly braces. Likewise, if your example is actually a simplified version of a more complex language.

-3


source







All Articles