Parsing Optional Groups

I am trying to create a regex string that pulls data from report files. The tricky part is that I need this single regex string to match multiple report content formats. I want the regex to always match even if some additional groups are not found.

Download the contents of the following report files ( Note : # 2 is missing the "val2" part.):

  • File # 1: " -val1-test-val2-result-val3-done - "
    • Expected Result:
      • Val1 group: test
      • Val2 group: result
      • Val3 group: done
  • File # 2: " -val1-test-val3-done - "
    • Expected Result:
      • Val1 group: test
      • Val2 group: (empty)
      • Val3 group: done

I tried the following regex lines:

Regex #1(Normal): "-val1-(?<val1>.+?)-val2-(?<val2>.+?)-val3-(?<val3>.+?)-"

      

Problem : File # 1 works fine, but in file # 2 the regex doesn't match, so I have no group values.

Regex #2(Non greedy)): "-val1-(?<val1>.+?)(-val2-(?<val2>.+?))?-val3-(?<val3>.+?)-"
Regex #3(Boolean OR): "-val1-(?<val1>.+?)(-val2-(?<val2>.+?)|(.*?))-val3-(?<val3>.+?)-"
Regex #4(Conditionnal): "-val1-(?<val1>.+?)(?(-val2-(?<val2>.+?))|(.+?))-val3-(?<val3>.+?)-"
Regex #5(Conditionnal): "-val1-(?<val1>.+?)(?(-val2-(?<val2>.+?))(-val2-(?<val2>.+?)))-val3-(?<val3>.+?)-"
Regex #6(Conditionnal): "-val1-(?<val1>.+?)(?(-val2-(?<val2>.+?))(-val2-(?<val2>.+?))|(.+?))-val3-(?<val3>.+?)-"

      

Problem : File # 2 works as expected, but group val2 of file # 1 is always empty.

Conclusion . The behavior seems to be that even if the optional group is present, the regex will prioritize the empty group value over the current value. Is there a way to force the complementary groups to get the value when they are present and return (empty) when they are not?

Note . I am using the latest .NET framework and the code will be ported to Java (Android). I try to avoid using multiple operations for performance and bandwidth issues.

Can anyone help me with this?

+3


source to share


1 answer


Perhaps if we make some assumptions:

  • values ​​may be missing, but they are always in the same order
  • the first value is always present
  • there is a separator before and after the part we are looking for

 



-val1-([^-]+)(?:-val2-([^-]+)|)(?:-val3-([^-]+)|)-

      

https://regex101.com/r/yY6vF9/1

+1


source







All Articles