Python Regular Expressions - Like "+?" equivalent to "*"
* : 0 or more occurrences of the pattern to its left
+ : 1 or more occurrences of the pattern to its left
? : 0 or 1 occurrences of the pattern to its left
How "+?" equivalent to "*"?
Consider searching for any 3-letter word if it exists.
re1.search(r,'(\w\w\w)*,"abc")
In the case of re1, * tries to get either 0 or more occurrences of the pattern to the left, which in this case is a group of 3 letters. So it will either try to find a three-letter word or fail
re2.search(r,'(\w\w\w)+?,"abc")
In case of re2, it should give the same result, but I'm confused why "*" and "? +" Are equivalent. Could you explain this?
source to share
*
and +?
are not equivalent. ?
takes on special meaning if it follows a quantifier, making that quantifier lazy.
Usually, quantifiers are greedy, that is, they will try to match as many repetitions as they can; lazy quantifiers match as little as possible. But a+?
it will still match at least one a
.
In [1]: re.search("(a*)(.*)", "aaaaaa").groups()
Out[1]: ('aaaaaa', '')
In [2]: re.search("(a+?)(.*)", "aaaaaa").groups()
Out[2]: ('a', 'aaaaa')
In your example, both the regular expression matches the same text, because both (\w\w\w)*
, and (\w\w\w)+?
may coincide with the three letters, and your input exactly three letters. But they will be different on other lines:
In [12]: re.search(r"(\w\w\w)+?", "abcdef")
Out[12]: <_sre.SRE_Match object; span=(0, 3), match='abc'>
In [13]: re.search(r"(\w\w\w)+?", "ab") # No match
In [14]: re.search(r"(\w\w\w)*", "abcdef")
Out[14]: <_sre.SRE_Match object; span=(0, 6), match='abcdef'>
In [15]: re.search(r"(\w\w\w)*", "ab")
Out[15]: <_sre.SRE_Match object; span=(0, 0), match=''>
source to share
If you use a simpler expression, you will see that they are not the same:
import re
>>> re.search("[0-9]*", "1")
<_sre.SRE_Match object; span=(0, 1), match='1'>
>>> re.search("[0-9]*", "")
<_sre.SRE_Match object; span=(0, 0), match=''>
>>> re.search("[0-9]+", "")
>>> re.search("[0-9]+", "1")
<_sre.SRE_Match object; span=(0, 1), match='1'>
The problem is in your code (words) + ?. is one or more or nothing
source to share