Python: Something Faster Than Not For Large Lists?

I am doing a project with word lists. I want to concatenate two word lists, but only store unique words.

I am reading words from a file and it seems to be taking a long time to read the file and save it as a list. I intend to copy the same block of code and run it using a second (or any subsequent) text file. The slow part of the code looks like this:

    while inLine!= "":
        inLine = inLine.strip()
        if inLine not in inList:
            inList.append(inLine)
        inLine = inFile.readline()

      

Please correct me if I am wrong, but I think the slow (est) part of the program is "out of comparison". What are some ways I can rewrite this to make it faster?

+3


source to share


2 answers


Judging by this line:

if inLine not in inList:
    inList.append(inLine)

      



It looks like you are enforcing uniqueness in the container inList

. You should consider using a more efficient data structure like a set inSet

. The check not in

can then be discarded as redundant, since duplicates will be prevented by the container anyway.

If the order of insertion must be preserved, you can achieve a similar result by using OrderedDict

with null values.

+6


source


If you want to concatenate two lists and remove duplicates, you can try something like this:



combined_list = list(set(first_list) | set(second_list))

      

0


source







All Articles