List accounting vs filter vs remove

How are the last two better than the first solution?

primes = (1, 2, 3, 5, 7)

# Classic solution
items = list(range(10))
for prime in primes:
    items.remove(prime)
items

# List comprehension
items = list(range(10))
[item for item in items if item not in primes]

# Filter
items = list(range(10))
list(filter(lambda item: item not in primes, items))

      

Three examples are what I came across in the book and says the first solutions take O (n * m) time (n = len (items), m = len (primes)), whereas the last two take O (n * 1) ... The result of 50 comparisons for the first solution (slightly better - 40 comparisons) and only 10 for the last one.

I do not understand this. I don't understand how time or memory could be more efficient.

Here's a paragraph in the book that explains it:

To remove or insert one item from / to a list, Python needs to copy the entire list, which is especially heavy with large lists. When doing it only once, of course, it's not that bad. But when doing a lot of deletions, defining a filter or list is much faster because, if structured correctly, it only needs to copy the list once. .... then examples ... The last two are much faster for large item lists. This is because the operations are much faster. To compare using n = len (items) and m = len (prime numbers), the former takes O (m * n) = 5 * 10 = 50 operations, while the latter two take O (n * 1) = 10 * 1 = 10 operations.

EDIT: The book is not wrong. primes = set((1, 2, 3, 5, 7))

is the correct ad, notprimes = (1, 2, 3, 5, 7)

+3


source to share


2 answers


If the code in the book is exactly the same as what you posted, then the book is wrong .

The first example has time complexity O(n*m)

, but so do the other two.

If there primes

were set

(or dict

), then it would be true: searching for existence using an operator in

in a hashmap has time complexity O(1)

, but in list

or tuple

has O(n)

! Therefore, complete complexity O(n*m)

.



Let's check it out with some measurements:

t = tuple(range(10000))
l = list(t)
s = set(t)
d = {i:1 for i in l}

In [16]: %%timeit
4738 in t
   ....: 
10000 loops, best of 3: 45.5 ยตs per loop

In [17]: %%timeit
4738 in l
   ....: 
10000 loops, best of 3: 45.4 ยตs per loop

In [18]: %%timeit
4738 in s
   ....: 
10000000 loops, best of 3: 36.9 ns per loop

In [19]: %%timeit
4738 in d
   ....: 
10000000 loops, best of 3: 38 ns per loop

      

Note that the search set

is ~37ns

(as in dict

), 3 orders of magnitude faster than list

/ tuple

, ~45us

.

+2


source


The main problem is with items.remove(prime)

. This is because Python lists are variable-length arrays, not linked lists. they use a contiguous block of memory with references to other objects. If an element is inserted / deleted from any position in a block, the entire element in the block must be moved to a new contiguous block of memory (some optimizations are implemented for insertion at the beginning or end of an array). see the documentation here you scroll through the list len(primes)

times, for each delete you move the items len(items)

once. If an element is present, you copy the list of elements into a new contiguous block, excluding the corresponding element. there is a hidden cost of copying an item to a new block.



The other two examples traverse the list, leaving all current items as they are. And return a new list based on the provided filter.

+2


source







All Articles