Positive lookbehind vs non-captureing group: different behavior

Question

Positive lookbehind vs non-captureing group: different behavior

I use python regex ( re

module) in my code and notice different behavior in these cases:

re.findall(r'\s*(?:[a-z]\))?[^.)]+', 'a) xyz. b) abc.') # non-capturing group
# results in ['a) xyz', ' b) abc']

and

re.findall(r'\s*(?<=[a-z]\))?[^.)]+', 'a) xyz. b) abc.') # lookbehind
# results in ['a', ' xyz', ' b', ' abc']

I only need to get ['xyz', 'abc']

. Why do the examples behave differently and how do you get the desired result?

+3

python regex lookbehind capturing-group

aplavin 04 Feb 13 at 17:46

source to share

2 answers

The regex you're looking for is:

re.findall(r'(?<=[a-z]\) )[^) .]+', 'a) xyz. b) abc.')

I believe that Anirudha's currently accepted answer explains the differences between your use of a positive lookbehind and an unsatisfactory one, however the suggestion to remove ?

after a positive lookbehind actually results in [' xyz', ' abc']

(note the including spaces).

This is because the positive lookbehind does not match the character space

and also does not include the space

matched character in the main class.

0

TobalJackson 10 Aug 17 at 13:49

source to share

Anirudha · Accepted Answer · 2013-02-04T17:53:04+0000

The reason a

and is b

included in the second case is because it (?<=[a-z]\))

will find it first a)

, and since the reverse side doesn't consume a character, you're back at the beginning of the line. Now [^.)]+

matchesa

You are now on )

. Since you made it (?<=[a-z]\))

optional [^.)]+

, matchesxyz

The same is repeated with b) abc

remove ?

from the second case and you get the expected result ie['xyz', 'abc']

Positive lookbehind vs non-captureing group: different behavior

More articles: