Expand a list while making a list in python
Here is the code:
import itertools, time
x = {1:[1, 2], 2:[2, 3], 3:[3, 4]}
l = [1, 2, 3] * 10000000
start = time.time()
y = [x[i] for i in l]
y = list(itertools.chain.from_iterable(y))
print(y[:6])
print(time.time() - start)
start = time.time()
y = []
[y.extend(x[i]) for i in l]
print(y[:6])
print(time.time() - start)
The first approach is to let the inner lists be nested and then flatten them out when the comprehension is complete. The second approach is to create a flat list during comprehension.
The first approach seems to be slightly faster:
3.8479559421539307
4.469805955886841
I am wondering if there are better (faster) ways to do this?
source to share
Timings closer to my car:
>>> x = {1: [1, 2], 2: [2, 3], 3: [3, 4]}
>>> l = [1, 2, 3] * 10000000
>>> import timeit
>>> import itertools
>>> def by_chaining():
... y = [x[i] for i in l]
... return list(itertools.chain.from_iterable(y))
...
>>> timeit.timeit(by_chaining, number=1)
5.491670315985385
>>> def by_extension():
... y = []
... [y.extend(x[i]) for i in l]
... return y
...
>>> timeit.timeit(by_extension, number=1)
5.656423713273028
The point is that the approach is .extend
really meant to be used in an explicit loop, because it is a function with side effects. By comprehending a list, you create a separate list full of values None
that are then thrown away, which adds quite a lot of overhead. Cm:
>>> def by_extension_loop():
... y = []
... for i in l: y.extend(x[i])
... return y
...
>>> timeit.timeit(by_extension_loop, number=1)
4.62852763674681
Let's try other approaches:
>>> def by_nested_comprehension():
... return [e for i in l for e in x[i]]
...
>>> timeit.timeit(by_nested_comprehension, number=1)
5.102275806385393
Do not do this. In theory, it gets the right answer, but will use excessive memory forever. (This is probably part of the reason why it sum
was explicitly truncated to avoid string summing - although I would like them to redirect it to a more efficient technique).
>>> def by_sum():
... return sum((x[i] for i in l), [])
How about providing a from_iterable
generator to work with?
>>> def by_chaining_generator():
... return list(itertools.chain.from_iterable(x[i] for i in l))
...
>>> timeit.timeit(by_chaining_generator, number=1)
5.420730297100135
How to use map
instead of understanding?
>>> def by_chaining_map():
... return list(itertools.chain.from_iterable(map(x.get, l)))
...
>>> timeit.timeit(by_chaining_map, number=1)
4.707590696974194
>>> def by_chaining_map_2():
... return list(itertools.chain.from_iterable(map(x.__getitem__, l)))
...
>>> timeit.timeit(by_chaining_map_2, number=1)
4.576851915207953
It looks like it might be a winner, but it's close. It is also possible to optimize the version list.append
by preselecting and pasting slices, IIRC. But my creativity seems to be exhausted at the moment :)
source to share