Problem with joining a list of multiple lines to a list of one line in Python

I am trying to write a Python program to check for phrases in a file in a document. My program works great until it hits a phrase like "happy (+) feet". I think the error is related to the "(+)" in the phrase; however I am not sure how to revise my regex to get it to work.

This is my code:

import re
handle = open('document.txt', 'r')
text = handle.read()

lst = list()
with open('phrases.txt', 'r') as phrases:
    for phrase in phrases:
        phrase = phrase.rstrip()
        if len(phrase) > 0 and phrase not in lst:
            ealst.append(phrase)

counts = {}
for each_phrase in lst:
    word = each_phrase.rsplit()
    pattern = re.compile(r'%s' % '\s+'.join(word), re.IGNORECASE)
    counts[each_phrase] = len(pattern.findall(text))

for key, value in counts.items():
    if value > 0:
       print key,',', value

 handle.close()
 phrases.close()

      

+2


source to share


1 answer


You need to use re.escape

when declaring word

:

word = map(re.escape, each_phrase.rsplit())

      

And maybe change \s+

to \s*

to make the space optional:



pattern = re.compile(r'%s' % '\s*'.join(word), re.IGNORECASE)

      

The parentheses (

and )

, +

plus the regular expression plus special characters must be escaped in a regular expression outside the character class to match literals.

Example IDEONE demo

0


source







All Articles