Checking for overlap in two long lists of elements in Python
I have two lists (list1 and list2) that contain 10 million company names. There are no duplicates in each list, but some companies appear in both lists. And I want to find what these companies are. I wrote the code below:
list_matched = []
for i in range(len(list1)):
for j in range(len(list2)):
if list1[i] == list2[j]:
list_matched.append(list1[i])
The problem with this code is that it never finishes executing. My question is what can I do to complete this task within a reasonable amount of time. The size of 10 million names seems too large to handle.
+3
source to share
2 answers
I would recommend making a set from one list and checking another, for example:
inlist1 = set(list1)
list_matched = [x for x in list2 if x in inlist1]
Of course, you can do it differently, depending on which order of the list (if any) you want to keep - this snippet preserves order list2
.
+3
source to share