Remove unique values ββfrom the list and keep only duplicates
I want to run a list of ids and return a list of all ids that have occurred more than once. This is what I created that works:
singles = list(ids)
duplicates = []
while len(singles) > 0:
elem = singles.pop()
if elem in singles:
duplicates.append(elem)
But the list of ids is likely to be quite long, and I realistically don't want a while loop to rely on an expensive len call if I can avoid it. (I could go the inelegant route and call len one time and then just decrease it each iteration, but I'd rather avoid that if I could).
source to share
A smart way to do this is to use a data structure that makes it simple and efficient, for example Counter
:
>>> ids = [random.randrange(100) for _ in range(200)]
>>> from collections import Counter
>>> counts = Counter(ids)
>>> dupids = [id for id in ids if counts[id] > 1]
The construction Counter
takes O (N) time, as opposed to O (N log N) time to sort, or O (N ^ 2) to count each element from zero each time.
As a side note:
But the list of ids is likely to be quite long, and I really don't want the while loop to rely on an expensive call to len if I can avoid it.
len
not expensive. This is constant time and (at least for a list of built-in types list
), it's about as fast as a function can get into Python without doing anything at all.
The part of your code that is more expensive calls elem in singles
inside the loop - this means that for each element, you must compare it to potentially any other element that means square time.
source to share