Continuation group items in the list. Is saving state in the itertools group a dysfunctional function?

I'm new to Python and I'm trying to write a function that groups list items with None

continue signaling items like this:

>>> g([1, None, 1, 1, None, None, 1])
[[1, None], [1], [1, None, None], [1]]

      

My real data has much more complex elements, but I've simplified the whole thing for this question.

This is my decision:

import itertools

# input
x = [1, None, 1, 1, None, None, 1]

# desired output from g(x)
y = [[1, None], [1], [1, None, None], [1]]


def f(x):
    if x is None:
        f.lastx = x
    else:
        if x != f.lastx:
            f.counter += 1
    return f.counter


def g(x):
    f.lastx = None
    f.counter = 0
    z = [list(g) for _, g in itertools.groupby(x, f)]
    return z


assert y == g(x)

      

It works, but I know it is very ugly.

Is there a better (and more Pythonic) way to do this? For example. without key state function.

+3


source to share


2 answers


You can combine itertools.groupby

and itertools.accumulate

:

>>> dat = [1, None, 1, 1, None, None, 1]
>>> it = iter(dat)
>>> acc = accumulate(x is not None for x in dat)
>>> [[next(it) for _ in g] for _, g in groupby(acc)]
[[1, None], [1], [1, None, None], [1]]

      

This works because it accumulate

will give us an increase in intlike values ​​at the start of each new group:

>>> list(accumulate(x is not None for x in dat))
[True, 1, 2, 3, 3, 3, 4]

      


If you want to be able to process a stream, just an tee

iterator. The maximum increase in memory usage depends only on the size of one group.



def cgroup(source):
    it, it2 = tee(iter(source), 2)
    acc = accumulate(x is not None for x in it)
    for _,g in groupby(acc):
        yield [next(it2) for _ in g]

      

It still gives

>>> list(cgroup([1, None, 1, 1, None, None, 1]))
[[1, None], [1], [1, None, None], [1]]

      

but will work even with infinite sources:

>>> stream = chain.from_iterable(repeat([1, 1, None]))
>>> list(islice(cgroup(stream), 10))
[[1], [1, None], [1], [1, None], [1], [1, None], [1], [1, None], [1], [1, None]]

      

+2


source


It's not ideal because it requires the third party extension ( iteration_utilities.split

) and some of them, but it gives the desired output:

>>> from iteration_utilities import split, is_not_None

>>> lst = [1, None, 1, 1, None, None, 1]

>>> list(split(lst, is_not_None, keep_after=True))[1:]
[[1, None], [1], [1, None, None], [1]]

      



The first element must be discarded (thus [1:]

) with this approach, because otherwise the result will start with an empty sublist.

+1


source







All Articles