Python regex match multiple words anywhere
I am trying to use python regex to match multiple word string. For example, the line "These are oranges, apples and pears, but not pinups, or ..." The list of words that I want to find is "and", "or" and "no". Regardless of order or position.
I tried it r'AND | OR | NOT
but doesn't work.
Also tried r'.*?\bAND\b.*?\bOR\b.*?\bNOT\b.*?$
it still doesn't work ...
Not good with regex. And a hint? Thank!
source to share
You have several problems here.
First, matches are case sensitive unless you use the IGNORECASE
/ flag I
to ignore case. Thus, it 'AND'
does not correspond 'AND'
.
Also, if you don't use the VERBOSE
/ flag X
, these spaces are part of the pattern. So, you are checking 'AND '
, not 'AND'
. If you wanted that, you probably needed spaces on each side, not just those sides (otherwise it 'band leader'
would match ...), and indeed, you probably wanted \b
, not a space (otherwise a sentence starting with 'And another thing'
not will match).
Finally, if you think you need .*
before and after your template in $
and ^
around it, you have a good chance of using search
, findall
or finditer
, rather than match
.
So:
>>> s = "These are oranges and apples and pears, but not pinapples or .."
>>> r = re.compile(r'\bAND\b | \bOR\b | \bNOT\b', flags=re.I | re.X)
>>> r.findall(s)
['and', 'and', 'not', 'or']
source to share
Try the following:
>>> re.findall(r"\band\b|\bor\b|\bnot\b", "These are oranges and apples and pears, but not pinapples or ..")
['and', 'and', 'not', 'or']
a | b means match either a or b
\ b represents a word boundary
re.findall (pattern, string) returns an array of all pattern instances in a string
source to share