Search in string and get 2 words before and after match in Python

Question

Search in string and get 2 words before and after match in Python

I am using Python to search for some words (also multi-tone) in a description (string).

For this I am using a regex like this

    result = re.search(word, description, re.IGNORECASE)
    if(result):
        print ("Trovato: "+result.group())

But I need to get the first 2 words before and after the match. For example, if I have something like this:

Parking is awful here, this shop sucks.

" here " is the word I'm looking for. So after I matched it to my regex, I need 2 words (if exists) before and after the match.

In the example: The parking lot here is awful, this

"Parking" and awful, these are the words I need.

ATTTENTION The description booth will be very long and the pattern "here" may appear multiple times?

+3

python string regex

Usi usi 30 jul. 15 at 1:03

source to share

4 answers

How about string operations?

line = 'Parking here is horrible, this shop sucks.'

before, term, after = line.partition('here is')
before = before.rsplit(maxsplit=2)[-2:]
after = after.split(maxsplit=2)[:2]

Result:

>>> before
['Parking']
>>> after
['horrible,', 'this']

+1

TigerhawkT3 30 jul. At 1:11

source to share

Try this regex: ((?:[a-z,]+\s+){0,2})here is\s+((?:[a-z,]+\s*){0,2})

with re.findall

and re.IGNORECASE

install

Demo

+1

Carsten Hagemann 30 jul. 15 at 3:08

source to share

Based on your explanation, it gets a little tricky. The solution below addresses scenarios where the pattern you are looking for might actually also be in the previous two or the next two words.

line = "Parking here is horrible, here is great here is mediocre here is here is "
print line
pattern = "here is"
r = re.search(pattern, line, re.IGNORECASE)
output = []
if r:
    while line:
        before, match, line = line.partition(pattern)
        if match:
            if not output:
                before = before.split()[-2:]
            else:    
                before = ' '.join([pattern, before]).split()[-2:]
            after = line.split()[:2]
            output.append((before, after))
print output

The result from my example would be:

[(['Parking'], ['horrible', 'here']), (['is', 'horrible'], ['great', 'here']), (['is' ',' great '], [' mediocre ',' here ']), ([' is', 'mediocre'], ['here', 'is']), ([' here ',' is'], []) ]

0

vonnetworking 30 jul. '15 at 3:30

source to share

maraca · Accepted Answer · 2015-07-30T04:09:14+0000

I would do it like this (edit: added anchors for most cases):

(\S+\s+|^)(\S+\s+|)here is(\s+\S+|)(\s+\S+|$)

Likewise, you will always have 4 groups (may need to be trimmed) with the following behavior:

If group 1 is empty, there was no word before (group 2 is also empty)
If group 2 is empty, there is only one word left (group 1)
If groups 1 and 2 are not empty, they are words before in order
If group 3 is empty, there were no words after
If group 4 is empty, there was only one word after:
If groups 3 and 4 are not empty, these are words after the order

Fixed demo link

Search in string and get 2 words before and after match in Python

More articles: