More efficient use of itertools.groupby ()

Question

More efficient use of itertools.groupby ()

I am trying to improve my knowledge of the library itertools

as it is so helpful. To this end, I am trying to solve the interview puzzle I ran into. Most of it involves counting sequentially the number of grouped and repeated digits within a number. For example, for a number:

1223444556

I want to:

[(1,1),(2,2),(1,3),(3,4),(2,5),(1,6)]

that is, from left to right, there are 1, 2, 1, 3, etc.

Here is my current code:

from itertools import groupby
groups_first = [int(''.join(v)[0]) for k,v in groupby(str(1223444556))]
counts = [len(''.join(v)) for k,v in groupby(str(1223444556))]
zip(counts,groups_first)

This works, but I would like to know if there is a more compact way to do this, bypassing concatenating the two lists together. Any thoughts? I think it might go for some kind of lambda function in groupby (), but I can't see it yet.

Thank!

+3

python itertools

verbsintransit Jan 31. 13 at 4:05

source to share

3 answers

I would just write

>>> n = 1223444556
>>> [(len(list(g)), int(k)) for k,g in groupby(str(n))]
[(1, 1), (2, 2), (1, 3), (3, 4), (2, 5), (1, 6)]

+2

DSM Jan 31. At 4:11 am

source to share

I would prefer instead of collections:

>>> from collections import Counter
>>> c = Counter('1223444556')
>>> c.items()
[('1', 1), ('3', 1), ('2', 2), ('5', 2), ('4', 3), ('6', 1)]

if order is important (as you say in your comment) this may not be the most efficient method. But for a complete look, you can do this:

>>> t = c.items()
>>> t = sorted(t)

And if you wanted y, x to be listed as x, y, you could do this:

>>> t = [(y, x) for x, y in t]
>>> print t
[(1, '1'), (2, '2'), (1, '3'), (3, '4'), (2, '5'), (1, '6')]

One implication of this method is that the repeating element is specified as a string, so there is no confusion as to which number comes from the original list and which number indicates the frequency.

+1

πόδας ὠκύς Jan 31. At 4:13 am

source to share

David Robinson · Accepted Answer · 2013-01-31T04:11:25+0000

What about:

[(sum(1 for _ in v), int(k)) for k,v in groupby(str(1223444556))]

More efficient use of itertools.groupby ()

More articles: