Recursive PCRE Search with Patterns
This question is related to PCRE.
I've seen recursive search for nested parentheses used with this construct:
\(((?>[^()]+)|(?R))*\)
The problem is that although " [^ ()] + 'can match any character, including a newline, you are forced to match only single-character characters such as curly braces, brackets, punctuation, single letters, etc.
What I am trying to do is replace the characters '(' and ')' with ANY model (such as "BEGIN" and "END" for example).
I came up with the following construction:
(?xs) (?# <-- 'xs' ignore whitespace in the search term, and allows '.'
to match newline )
(?P<pattern1>BEGIN)
(
(?> (?# <-- "once only" search )
(
(?! (?P=pattern1) | (?P<pattern2>END)).
)+
)
| (?R)
)*
END
This will work on what looks like this:
BEGIN <<date>>
<<something>
BEGIN
<<something>>
END <<comment>>
BEGIN <<time>>
<<more somethings>>
BEGIN(cause we can)END
BEGINEND
END
<<something else>>
END
This matches any nested BEGIN..END pairs successfully.
I have set named patterns pattern1 and pattern2 for BEGIN and END respectively. Using pattern1 in your search term works great. However, I cannot use pattern2 at the end of the search: I have to write " END ".
Any idea how I can rewrite this regex so I only need to specify patterns once and use them "everywhere" in the code? In other words, I don't need to write END either in the middle of the search or at the very end.
source to share
To continue working with @ Kobis answer, see the following regex:
(?xs)
(?(DEFINE)
(?<pattern1>BEGIN)
(?<pattern2>END)
)
(?=((?&pattern1)
(?:
(?> (?# <-- "once only" search )
(?:
(?! (?&pattern1) | (?&pattern2)) .
)+
)*
| (?3)
)*
(?&pattern2)
))
This regex will even let you get data for every single block of data! Use the third backlink as the first two were defined in the define block.
source to share
This looks like a good use case for a block (?(DEFINE))
that is used to create such constructs. Perl example:
(?xs)
(?(DEFINE)
(?<pattern1>BEGIN)
(?<pattern2>END)
)
(?&pattern1)
(
(?> (?# <-- "once only" search )
(
(?! (?&pattern1) | (?&pattern2)).
)+
)
| (?R)
)*
(?&pattern2)
Example: http://ideone.com/8o9cg
(note that I really don't know perl and cannot get it to work in PHP on any of the online testers)
See also: http://www.pcre.org/pcre.txt (look for (?(DEFINE)
0, it doesn't look like they have pages)
A low tech solution that works for most tastes is to use a lookahead at the beginning of the template:
(?=.*?(?P<pattern1>BEGIN))
(?=.*?(?P<pattern2>END))
...
(?P=pattern1) (?# should work - it was captured )
source to share