Python Regular Expressions - Like "+?" equivalent to "*"

   * : 0 or more occurrences of the pattern to its left
   + : 1 or more occurrences of the pattern to its left     
   ? : 0 or 1 occurrences of the pattern to its left

      

How "+?" equivalent to "*"?

Consider searching for any 3-letter word if it exists.

re1.search(r,'(\w\w\w)*,"abc")

      

In the case of re1, * tries to get either 0 or more occurrences of the pattern to the left, which in this case is a group of 3 letters. So it will either try to find a three-letter word or fail

re2.search(r,'(\w\w\w)+?,"abc")

      

In case of re2, it should give the same result, but I'm confused why "*" and "? +" Are equivalent. Could you explain this?

+3


source to share


2 answers


*

and +?

are not equivalent. ?

takes on special meaning if it follows a quantifier, making that quantifier lazy.

Usually, quantifiers are greedy, that is, they will try to match as many repetitions as they can; lazy quantifiers match as little as possible. But a+?

it will still match at least one a

.

In [1]: re.search("(a*)(.*)", "aaaaaa").groups()
Out[1]: ('aaaaaa', '')

In [2]: re.search("(a+?)(.*)", "aaaaaa").groups()
Out[2]: ('a', 'aaaaa')

      



In your example, both the regular expression matches the same text, because both (\w\w\w)*

, and (\w\w\w)+?

may coincide with the three letters, and your input exactly three letters. But they will be different on other lines:

In [12]: re.search(r"(\w\w\w)+?", "abcdef")
Out[12]: <_sre.SRE_Match object; span=(0, 3), match='abc'>

In [13]: re.search(r"(\w\w\w)+?", "ab") # No match

In [14]: re.search(r"(\w\w\w)*", "abcdef")
Out[14]: <_sre.SRE_Match object; span=(0, 6), match='abcdef'>

In [15]: re.search(r"(\w\w\w)*", "ab")
Out[15]: <_sre.SRE_Match object; span=(0, 0), match=''>

      

+4


source


If you use a simpler expression, you will see that they are not the same:

import re
>>> re.search("[0-9]*", "1")
<_sre.SRE_Match object; span=(0, 1), match='1'>
>>> re.search("[0-9]*", "")
<_sre.SRE_Match object; span=(0, 0), match=''>
>>> re.search("[0-9]+", "")
>>> re.search("[0-9]+", "1")
<_sre.SRE_Match object; span=(0, 1), match='1'>

      



The problem is in your code (words) + ?. is one or more or nothing

-1


source







All Articles