Python extract words from a string based on a large list of words

First, I have a large list of words:

words = ['about', 'black', 'red', ...]  # nums: 20000+

      

Then if given a string like:

s = 'blackingabouthahah'

      

I want to receive ['black', 'about']

I tried using regex for this:

pattern = re.compile('|'.join(words))
print pattern.findall(s)

      

This works, but I'm worried about the speed and memory usage of this method.

Is there a better solution?

+3


source to share


2 answers


You can take a non-regex approach .find

using comprehension:

words = ['about', 'black', 'red']
s = 'blackingabouthahah'
print [x for x in words if s.find(x)>-1]

      

See IDEONE demo



This will produce unique occurrences of the terms in the list. If you need to count all occurrences:

words = ['about', 'black', 'red']
s = 'blackingabouthahahabout'
print [s.count(x) for x in words]

      

Since I don't see the difference between the first about

and the second about

. See another demo .

0


source


If you just want to print I have a solution here

   import re

   words = ['about', 'black', 'red',] 
   s = 'dsjhdgblackingabouthahah'

   for items in words:
      if re.search (items,s):
          print items

      



If you want results in a new list, you can try this:

 import re

 words = ['about', 'black', 'red',] 
 s = 'dsjhdgblackingabouthahah'
 mylist = []
 for items in words:
    if re.search (items,s):
       mylist.append( items)

 print mylist

      

0


source







All Articles