How does this regex match groups

Looking at this ^ \ s * (_?) (\ S +?) \ 1 \ s * $ regex from injector.js

.

I was able to figure out how the string matches _non_

. The first capture group consists of _

, the second capture group consists of non

, and a reference to the result of the first capture group gets _

. So, the first group _

, the second group non

, and the third - _

.

However, I have not been able to figure out how strings are _

, _non

and are __

matched by the second group, given the reference to \1

in the expression, which would have expected _

at the end, given _

at the beginning.

+3


source to share


1 answer


Template: ^\s*(_?)(\S+?)\1\s*$

Overall, this pattern:

^

start at the beginning of the line

\s*

matches 0 or more space characters

(_?)

match and capture 0 or 1 underscore (capturing group 1)

(\S+?)

unwanted match and capture of 1 or more non-whitespace char (capture group 2)

\1

matches what was matched in capture group 1

\s*

matches 0 or more space characters

$

end of line / line match

Topic: _

Group 1:

Group 2: _

This will initially be agreed in the first capture group. But then the engine goes to the second capturing group and it expects at least one char to match, so the engine backs out and takes the char from the first capturing group, because ?

in the first capturing group it does not have to do this, and it _

is a space char. Then, since nothing was matched in capture group 1 (because group 2 had to be satisfied), \1

there is nothing matched in the backlink.

Topic: _non

Group 1:

Group 2: _non

Initially _

matches in group 1, then non

matches in group 2. Then the engine searches _

for that reference \1

and there are none, so the engine backs off and match removes it from group 1 and matches it in group 2.



Topic: _non_

Group 1: _

Group 2: non

Same as before: Initially _

matches in group 1, then non

matches in group 2. Then the engine searches _

for that reference \1

that it matches, so group 1 keeps itss _

and group 2 just has non

.

Topic: __

Group 1:

Group 2: __

This is essentially the same as the first example _

. Initially the first is _

matched in group 1. Then the second is _

matched in group 2. Then it \1

tries to match the other _

, since group 1 got one but it doesn't. But group 2 requires at least 1 char, but may have more, so the regex engine maintains a backup and puts the match of group 1 in group 2.

Topic: _ _

Group 1:

Group 2:

This does not lead to a coincidence. The engine starts the first _

in group 1, but then fails to place the space in group 2. So it backs up and tries to put the first _

in group 2. Since there is no group 1, there is also no one \1

to match. The space is then matched with \s*

, but after that the match ends on the final one _

, because the pattern only says spaces before the end of the line.

Sidenote

You asked in the comment:

if it matches _

for the first group it must match _

in \1

.Does \1

it refers to an expression or the result of an expression?

It refers to the result of the expression (which is actually captured), not the expression itself.

+5


source







All Articles