Regex to capture an optional group in the middle of an input block

I am stuck on a RegEx problem that seems very simple and yet I cannot get it to work.

Suppose I have an input like this:

Some text %interestingbit% lots of random text lots and lots more %anotherinterestingbit%
Some text %interestingbit% lots of random text OPTIONAL_THING lots and lots more %anotherinterestingbit%
Some text %interestingbit% lots of random text lots and lots more %anotherinterestingbit%

      

There are a lot of repeating blocks in the input and in each block I want to grab some things that are always there (% interestbit% and% anotherinterestingbit%), but there is also some text that may or may not happen in between (OPTIONAL_THING) and I want to capture him if he's there.

In this case, RegEx only matches blocks with OPTIONAL_THING (and named capture):

%interestingbit%.+?((?<OptionalCapture>OPTIONAL_THING)).+?%anotherinterestingbit%

      

So it looks like it's just a matter of making the whole group optional, right? This is what I tried:

%interestingbit%.+?((?<OptionalCapture>OPTIONAL_THING))?.+?%anotherinterestingbit%

      

But I believe that while this matches all 3 blocks, the named capture (OptionalCapture) is empty in all of them! How can I get this to work?

Note that there can be a lot of text inside each block, including newlines, so I put "+ +". not something more specific. I am using .NET regex, testing with Regulator.

+1


source to share


3 answers


My thoughts are similar to Niko's. However, I would suggest posting a second one. +? inside the optional group, not the first, as follows:

%interestingbit%.+?(?:(?<optionalCapture>OPTIONAL_THING).+?)?%anotherinterestingbit%

      



This avoids unnecessary returns. If the first. +? is inside an optional group and OPTIONAL_THING does not exist in the search string, the regex will not know this until it reaches the end of the string. It will then need to rollback, perhaps quite a bit to match the% anotherinterestingbit%, which you said will always exist.

Also, since OPTIONAL_THING, when it exists, will always be before% anotherinterestingbit%, and then the text after it is also optional, and more naturally inserted into the optional group.

+2


source


Why do you have an extra set of parentheses?

Try the following:

%interestingbit%.+?(?<OptionalCapture>OPTIONAL_THING)?.+?%anotherinterestingbit%

      



Or maybe this will work:

%interestingbit%.+?(?<OptionalCapture>OPTIONAL_THING|).+?%anotherinterestingbit%

      

In this example, the group commits OPTIONAL_THING or nothing.

0


source


Try the following:

%interestingbit%(?:(.+)(?<optionalCapture>OPTIONAL_THING))?(.+?)%anotherinterestingbit%

      

First a non-capture group appears that matches .+OPTIONAL_THING

or nothing. If a match is found, there will be a specified group inside that captures OPTIONAL_THING

for you. The rest is recorded with .+?%anotherinterestingbit%

.

[edit]: I added a couple of parentheses for additional capturing groups, so the captured groups now match the following:

  • $ 1: text before OPTIONAL_THING or nothing
  • $ 2 or $ optionalCapture: OPTIONAL_THING or nothing
  • $ 3: text after OPTIONAL_THING, or if OPTIONAL_THING is not found, full text between% interestbit% and% anotherinterestingbit%

Are these the three matches you are looking for?

0


source







All Articles