Python extract words from a string based on a large list of words
First, I have a large list of words:
words = ['about', 'black', 'red', ...] # nums: 20000+
Then if given a string like:
s = 'blackingabouthahah'
I want to receive ['black', 'about']
I tried using regex for this:
pattern = re.compile('|'.join(words))
print pattern.findall(s)
This works, but I'm worried about the speed and memory usage of this method.
Is there a better solution?
source to share
You can take a non-regex approach .find
using comprehension:
words = ['about', 'black', 'red']
s = 'blackingabouthahah'
print [x for x in words if s.find(x)>-1]
See IDEONE demo
This will produce unique occurrences of the terms in the list. If you need to count all occurrences:
words = ['about', 'black', 'red']
s = 'blackingabouthahahabout'
print [s.count(x) for x in words]
Since I don't see the difference between the first about
and the second about
. See another demo .
source to share
If you just want to print I have a solution here
import re
words = ['about', 'black', 'red',]
s = 'dsjhdgblackingabouthahah'
for items in words:
if re.search (items,s):
print items
If you want results in a new list, you can try this:
import re
words = ['about', 'black', 'red',]
s = 'dsjhdgblackingabouthahah'
mylist = []
for items in words:
if re.search (items,s):
mylist.append( items)
print mylist
source to share