PCRE relay recrex works, but routines are not

I am trying to match the texts:

1. "HeyHey HeyHey"

2. "HeyHey HeyHeyy"

with regular expressions:

a /(\w+) \1\w/

b /(\w+) (\w+)\w/

c /(\w+) (?1)\w/


  • Regex a is exactly the same as 1 and 2 , but the last 'y'.
  • Regex b matches exactly 1 and 2 .
  • Regex c doesn't match 1 or 2 .

Following http://www.rexegg.com/regex-disambiguation.html#subroutines I thought b and c are equivalent. But apparently this is not the case.

What is the difference? Why doesn't the subroutine work while copying the same regex works?

experimented here: https://regex101.com/#pcre

+3


source to share


1 answer


This is because with PCRE the subpattern reference ( (?1)

here) is atomic by default.

(Note that this behavior is especially specific to PCRE, and Perl does not convey it.)

Subpattern \w+

(with a greedy quantifier), all the characters of the word are matched ( HeyHeyy

on the second line), but since it (?1)

is atomic, the regex engine cannot indented and return the latter y

to make it \w

successful.

You can get the same result with this template:

/(\w+) (?>\w+)\w/
     # ^-----^-- atomic group

      



which does not match the string if, without the atomic group, the pattern succeeds:

/(\w+) \w+\w/

      

More on atomic groups: http://regular-expressions.info/atomic.html

This feature is also described here (but only in a recursive context): http://www.rexegg.com/regex-recursion.html (see "Recursion depths are atomic")

+3


source







All Articles