Why does Python regex ". * PATTERN *" match "XXPATTERXX"?

Suppose I want to search "PATTERN"

in a string where "PATTERN"

can be anywhere in the string. My first attempt was *PATTERN*

, but it generates an error saying "nothing to repeat" which I can accept, I tried .*PATTERN*

. However this regex does not give the expected result, see below

import re
p = re.compile(".*PATTERN*")
s = "XXPATTERXX"
if p.match(s):
    print s + " match with '.*PATTERN*'"

      

Result

XXPATTERXX match with '.*PATTERN*'

      

Why "PATTER"

does it match?

Note. I know what I could use .*PATTERN.*

to get the expected result, but I'm curious to know why the asterisk on it itself can't get the results.

+3


source to share


1 answer


Your pattern matches 0 or more characters N

at the end, but says nothing about what happens after those N

characters.

You can add $

to the template to bind to the end of the input string to disallow XX

:



>>> import re
>>> re.compile(".*PATTERN*$")
<_sre.SRE_Pattern object at 0x10029fb90>
>>> import re
>>> p = re.compile(".*PATTERN*$")
>>> p.match("XXPATTERXX") is None
True
>>> p.match("XXPATTER") is None
False
>>> p.match("XXPATTER")
<_sre.SRE_Match object at 0x1004627e8>

      

You might want to explore the different types of bindings. \b

may also suit your needs; it matches word boundaries (so between class characters \w

and \w

or between \w

and \w

), or you can use negative look and feel to disallow other characters around your PATTERN

string.

+9


source







All Articles