Recursive PCRE Search with Patterns

This question is related to PCRE.

I've seen recursive search for nested parentheses used with this construct:

\(((?>[^()]+)|(?R))*\)

      

The problem is that although " [^ ()] + 'can match any character, including a newline, you are forced to match only single-character characters such as curly braces, brackets, punctuation, single letters, etc.

What I am trying to do is replace the characters '(' and ')' with ANY model (such as "BEGIN" and "END" for example).

I came up with the following construction:

(?xs)  (?# <-- 'xs' ignore whitespace in the search term, and allows '.'
               to match newline )
(?P<pattern1>BEGIN)
(
   (?> (?# <-- "once only" search )
      (
         (?! (?P=pattern1) | (?P<pattern2>END)).
      )+
   )
   | (?R)
)*
END

      

This will work on what looks like this:

BEGIN <<date>>
  <<something>
    BEGIN
      <<something>>
    END <<comment>>
    BEGIN <<time>>
      <<more somethings>>
      BEGIN(cause we can)END
      BEGINEND
    END
  <<something else>>
END

      

This matches any nested BEGIN..END pairs successfully.

I have set named patterns pattern1 and pattern2 for BEGIN and END respectively. Using pattern1 in your search term works great. However, I cannot use pattern2 at the end of the search: I have to write " END ".

Any idea how I can rewrite this regex so I only need to specify patterns once and use them "everywhere" in the code? In other words, I don't need to write END either in the middle of the search or at the very end.

0


source to share


2 answers


To continue working with @ Kobis answer, see the following regex:

(?xs)
(?(DEFINE)
        (?<pattern1>BEGIN)
        (?<pattern2>END)
)
(?=((?&pattern1)
(?:
   (?> (?# <-- "once only" search )
      (?:
         (?! (?&pattern1) | (?&pattern2)) .
      )+
   )*
   | (?3)
)*
(?&pattern2)
))

      



This regex will even let you get data for every single block of data! Use the third backlink as the first two were defined in the define block.

Demo: http://regex101.com/r/bX8mB6

+3


source


This looks like a good use case for a block (?(DEFINE))

that is used to create such constructs. Perl example:

(?xs)
(?(DEFINE)
        (?<pattern1>BEGIN)
        (?<pattern2>END)
)
(?&pattern1)
(
   (?> (?# <-- "once only" search )
      (
         (?! (?&pattern1) | (?&pattern2)).
      )+
   )
   | (?R)
)*
(?&pattern2)

      

Example: http://ideone.com/8o9cg

(note that I really don't know perl and cannot get it to work in PHP on any of the online testers)



See also: http://www.pcre.org/pcre.txt (look for (?(DEFINE)

0, it doesn't look like they have pages)


A low tech solution that works for most tastes is to use a lookahead at the beginning of the template:

(?=.*?(?P<pattern1>BEGIN))
(?=.*?(?P<pattern2>END))
...
(?P=pattern1) (?# should work - it was captured )

      

0


source







All Articles