Continuation group items in the list. Is saving state in the itertools group a dysfunctional function?
I'm new to Python and I'm trying to write a function that groups list items with None
continue signaling items like this:
>>> g([1, None, 1, 1, None, None, 1])
[[1, None], [1], [1, None, None], [1]]
My real data has much more complex elements, but I've simplified the whole thing for this question.
This is my decision:
import itertools
# input
x = [1, None, 1, 1, None, None, 1]
# desired output from g(x)
y = [[1, None], [1], [1, None, None], [1]]
def f(x):
if x is None:
f.lastx = x
else:
if x != f.lastx:
f.counter += 1
return f.counter
def g(x):
f.lastx = None
f.counter = 0
z = [list(g) for _, g in itertools.groupby(x, f)]
return z
assert y == g(x)
It works, but I know it is very ugly.
Is there a better (and more Pythonic) way to do this? For example. without key state function.
source to share
You can combine itertools.groupby
and itertools.accumulate
:
>>> dat = [1, None, 1, 1, None, None, 1]
>>> it = iter(dat)
>>> acc = accumulate(x is not None for x in dat)
>>> [[next(it) for _ in g] for _, g in groupby(acc)]
[[1, None], [1], [1, None, None], [1]]
This works because it accumulate
will give us an increase in intlike values ββat the start of each new group:
>>> list(accumulate(x is not None for x in dat))
[True, 1, 2, 3, 3, 3, 4]
If you want to be able to process a stream, just an tee
iterator. The maximum increase in memory usage depends only on the size of one group.
def cgroup(source):
it, it2 = tee(iter(source), 2)
acc = accumulate(x is not None for x in it)
for _,g in groupby(acc):
yield [next(it2) for _ in g]
It still gives
>>> list(cgroup([1, None, 1, 1, None, None, 1]))
[[1, None], [1], [1, None, None], [1]]
but will work even with infinite sources:
>>> stream = chain.from_iterable(repeat([1, 1, None]))
>>> list(islice(cgroup(stream), 10))
[[1], [1, None], [1], [1, None], [1], [1, None], [1], [1, None], [1], [1, None]]
source to share
It's not ideal because it requires the third party extension ( iteration_utilities.split
) and some of them, but it gives the desired output:
>>> from iteration_utilities import split, is_not_None
>>> lst = [1, None, 1, 1, None, None, 1]
>>> list(split(lst, is_not_None, keep_after=True))[1:]
[[1, None], [1], [1, None, None], [1]]
The first element must be discarded (thus [1:]
) with this approach, because otherwise the result will start with an empty sublist.
source to share