Regex to capture an optional group in the middle of an input block
I am stuck on a RegEx problem that seems very simple and yet I cannot get it to work.
Suppose I have an input like this:
Some text %interestingbit% lots of random text lots and lots more %anotherinterestingbit%
Some text %interestingbit% lots of random text OPTIONAL_THING lots and lots more %anotherinterestingbit%
Some text %interestingbit% lots of random text lots and lots more %anotherinterestingbit%
There are a lot of repeating blocks in the input and in each block I want to grab some things that are always there (% interestbit% and% anotherinterestingbit%), but there is also some text that may or may not happen in between (OPTIONAL_THING) and I want to capture him if he's there.
In this case, RegEx only matches blocks with OPTIONAL_THING (and named capture):
%interestingbit%.+?((?<OptionalCapture>OPTIONAL_THING)).+?%anotherinterestingbit%
So it looks like it's just a matter of making the whole group optional, right? This is what I tried:
%interestingbit%.+?((?<OptionalCapture>OPTIONAL_THING))?.+?%anotherinterestingbit%
But I believe that while this matches all 3 blocks, the named capture (OptionalCapture) is empty in all of them! How can I get this to work?
Note that there can be a lot of text inside each block, including newlines, so I put "+ +". not something more specific. I am using .NET regex, testing with Regulator.
source to share
My thoughts are similar to Niko's. However, I would suggest posting a second one. +? inside the optional group, not the first, as follows:
%interestingbit%.+?(?:(?<optionalCapture>OPTIONAL_THING).+?)?%anotherinterestingbit%
This avoids unnecessary returns. If the first. +? is inside an optional group and OPTIONAL_THING does not exist in the search string, the regex will not know this until it reaches the end of the string. It will then need to rollback, perhaps quite a bit to match the% anotherinterestingbit%, which you said will always exist.
Also, since OPTIONAL_THING, when it exists, will always be before% anotherinterestingbit%, and then the text after it is also optional, and more naturally inserted into the optional group.
source to share
Why do you have an extra set of parentheses?
Try the following:
%interestingbit%.+?(?<OptionalCapture>OPTIONAL_THING)?.+?%anotherinterestingbit%
Or maybe this will work:
%interestingbit%.+?(?<OptionalCapture>OPTIONAL_THING|).+?%anotherinterestingbit%
In this example, the group commits OPTIONAL_THING or nothing.
source to share
Try the following:
%interestingbit%(?:(.+)(?<optionalCapture>OPTIONAL_THING))?(.+?)%anotherinterestingbit%
First a non-capture group appears that matches .+OPTIONAL_THING
or nothing. If a match is found, there will be a specified group inside that captures OPTIONAL_THING
for you. The rest is recorded with .+?%anotherinterestingbit%
.
[edit]: I added a couple of parentheses for additional capturing groups, so the captured groups now match the following:
- $ 1: text before OPTIONAL_THING or nothing
- $ 2 or $ optionalCapture: OPTIONAL_THING or nothing
- $ 3: text after OPTIONAL_THING, or if OPTIONAL_THING is not found, full text between% interestbit% and% anotherinterestingbit%
Are these the three matches you are looking for?
source to share