Checking for overlap in two long lists of elements in Python

I have two lists (list1 and list2) that contain 10 million company names. There are no duplicates in each list, but some companies appear in both lists. And I want to find what these companies are. I wrote the code below:

list_matched = []
for i in range(len(list1)):
    for j in range(len(list2)):
        if list1[i] == list2[j]:
            list_matched.append(list1[i])

      

The problem with this code is that it never finishes executing. My question is what can I do to complete this task within a reasonable amount of time. The size of 10 million names seems too large to handle.

+3


source to share


2 answers


Use dialing logic. It is specially designed for this task.

a = set(list1)
b = set(list2)

companies_in_both = a & b

      



(You end up with set

. If you need it as a list, just pass it to list()

.)

+7


source


I would recommend making a set from one list and checking another, for example:

inlist1 = set(list1)
list_matched = [x for x in list2 if x in inlist1]

      



Of course, you can do it differently, depending on which order of the list (if any) you want to keep - this snippet preserves order list2

.

+3


source







All Articles