Python regex match multiple words anywhere

I am trying to use python regex to match multiple word string. For example, the line "These are oranges, apples and pears, but not pinups, or ..." The list of words that I want to find is "and", "or" and "no". Regardless of order or position.

I tried it r'AND | OR | NOT

but doesn't work.

Also tried r'.*?\bAND\b.*?\bOR\b.*?\bNOT\b.*?$

it still doesn't work ...

Not good with regex. And a hint? Thank!

+3


source to share


2 answers


You have several problems here.

First, matches are case sensitive unless you use the IGNORECASE

/ flag I

to ignore case. Thus, it 'AND'

does not correspond 'AND'

.

Also, if you don't use the VERBOSE

/ flag X

, these spaces are part of the pattern. So, you are checking 'AND '

, not 'AND'

. If you wanted that, you probably needed spaces on each side, not just those sides (otherwise it 'band leader'

would match ...), and indeed, you probably wanted \b

, not a space (otherwise a sentence starting with 'And another thing'

not will match).

Finally, if you think you need .*

before and after your template in $

and ^

around it, you have a good chance of using search

, findall

or finditer

, rather than match

.



So:

>>> s = "These are oranges and apples and pears, but not pinapples or .."
>>> r = re.compile(r'\bAND\b | \bOR\b | \bNOT\b', flags=re.I | re.X)
>>> r.findall(s)
['and', 'and', 'not', 'or']

      

Regular expression visualization

Debuggex Demo

+8


source


Try the following:

>>> re.findall(r"\band\b|\bor\b|\bnot\b", "These are oranges and apples and pears, but not pinapples or ..")
['and', 'and', 'not', 'or']

      

a | b means match either a or b



\ b represents a word boundary

re.findall (pattern, string) returns an array of all pattern instances in a string

0


source







All Articles