Python: check partial string match between two lists
I have two lists as shown below:
c = ['John', 'query 989877 forcast', 'Tamm']
isl = ['My name is Anne Query 989877', 'John', 'Tamm Ju']
I want to check every element in isl
with every element in c
to get all my partial string matches. The result I want will look like this:
out = ["john", "query 989877", "tamm"]
As you can see, I also got partial string matches.
I've tried the following:
out = []
for word in c:
for w in isl:
if word.lower() in w.lower():
out.append(word)
But this only gives me output as
out = ["John", "Tamm"]
I've also tried the following:
print [word for word in c if word.lower() in (e.lower() for e in isl)]
But that only outputs "John". How do I get what I want?
source to share
Perhaps something like this:
def get_sub_strings(s):
words = s.split()
for i in xrange(1, len(words)+1): #reverse the order here
for n in xrange(0, len(words)+1-i):
yield ' '.join(words[n:n+i])
...
>>> out = []
>>> for word in c:
for sub in get_sub_strings(word.lower()):
for s in isl:
if sub in s.lower():
out.append(sub)
...
>>> out
['john', 'query', '989877', 'query 989877', 'tamm']
If you only want to keep the largest match, then you need to generate substrings in reverse order and break as soon as a match is found in isl
:
def get_sub_strings(s):
words = s.split()
for i in xrange(len(words)+1, 0, -1):
for n in xrange(0, len(words)+1-i):
yield ' '.join(words[n:n+i])
out = []
for word in c:
for sub in get_sub_strings(word.lower()):
if any(sub in s.lower() for s in isl):
out.append(sub)
break
print out
#['john', 'query 989877', 'tamm']
source to share
Ok, I came up with this! A very hacky way to do this; I don't like the method itself, but it gives me its own result:
Step1:
in: c1 = []
for r in c:
c1.append(r.split())
out: c1 = [['John'], ['query', '989877', 'forcast'], ['Tamm']]
Step2:
in: p = []
for w in isl:
for word in c1:
for w1 in word:
if w1.lower() in w.lower():
p.append(w1)
out: p = ['query', '989877', 'John', 'Tamm']
Step3:
in: out = []
for word in c:
t = []
for i in p:
if i in word:
t.append(i)
out.append(t)
out: out = [['John'], ['query', '989877'], ['Tamm']]
Step4:
in: out_final = []
for i in out:
out_final.append(" ".join(e for e in i))
out: out_final = ['John', 'query 989877', 'Tamm']
source to share