Python regex with *?

What does this Python regular expression match?

.*?[^\\]\n

      

I am confused as to why .

both follow *

and ?

.

+3


source to share


3 answers


*

means "matches the previous element as many times as possible (zero or more times)".

*?

means "match the previous item as little as possible (zero or more times)".

The other answers have already addressed this, but what they don't call is how it changes the regex, well, if a flag is re.DOTALL

provided, it makes a huge difference because it .

will match line break characters with this enabled. This .*[^\\]\n

will match the start of the line up to the last newline that is not traced by a backslash (this will match multiple lines).

If the flag is re.DOTALL

not specified, the difference is more subtle, [^\\]

will match everything else except the backslash, including line breaks. Consider the following example:



>>> import re
>>> s = "foo\n\nbar"
>>> re.findall(r'.*?[^\\]\n', s)
['foo\n']
>>> re.findall(r'.*[^\\]\n', s)
['foo\n\n']

      

So the purpose of this regex is to find nonblank lines that don't end with a backslash, but if you use .*

instead .*?

, you will match additional ones \n

if you have an empty string following a nonblank string.

This is because it .*?

only matches fo

, [^\\]

matches the second o

, and \n

matches at the end of the first line. However .*

will match foo

, [^\\]

will match \n

to complete the first line, and the next \n

will match because the second line is empty.

+5


source


.

indicates a wild card. It can match anything except \n

if the corresponding flag is not used.

*

indicates that you can have 0 or more things preceding it.



?

indicates that the previous quantifier is lazy. It will stop searching after the first match it finds.

+4


source


Python *?

, :

*?

, +?

, ??

:

*

, +

?

; . ; RE <.*>

<H1>title</H1>

, , <H1>

. ?

, - ; . .*?

<H1>

.

+4









All Articles