Split list by sequential common item

I have the following list which contains only two characters "N" and "C"

ls = ['N', 'N', 'N', 'C', 'C', 'C', 'C', 'N', 'C', 'C']

      

What I want to do is extract the consecutive "Cs" and return the index to the list.

Yielding to something like

  chunk1 = [('C', 'C', 'C', 'C'), [3,4,5,6]]
  chunk2 = [('C', 'C'), [8,9]]

  # and when there no C it returns empty list.

      

How can I achieve this in Python?

I tried this but didn't do as I hoped:

from itertools import groupby
from operator import itemgetter
tmp = (list(g) for k, g in groupby(enumerate(ls), itemgetter(1)) if k == 'C')
zip(*tmp)

      

+3


source to share


3 answers


Move zip(*...)

inside list comprehension:

import itertools as IT
import operator

ls = ['N', 'N', 'N', 'C', 'C', 'C', 'C', 'N', 'C', 'C']

[list(zip(*g))[::-1] 
 for k, g in IT.groupby(enumerate(ls), operator.itemgetter(1)) 
 if k == 'C']

      

gives



[[('C', 'C', 'C', 'C'), (3, 4, 5, 6)], [('C', 'C'), (8, 9)]]

      


In Python2, it list(zip(...))

can be replaced with zip(...)

, but since Python3 zip

returns an iterator, we need list(zip(...))

. To make the solution compatible with Python2 and Python3 use list(zip(...))

here.

+5


source


Use a generator function. all you have to do is expand group

when unpacking the group. so useyield zip(*group)[::-1]



from itertools import groupby
from operator import itemgetter
def solve(ls):
    for key, group in groupby(enumerate(ls), itemgetter(1)):
        if key =='C':
            yield zip(*group)[::-1]

ls =  ['N', 'N', 'N', 'C', 'C', 'C', 'C', 'N', 'C', 'C']
print list(solve(ls))


[[('C', 'C', 'C', 'C'), (3, 4, 5, 6)], [('C', 'C'), (8, 9)]]

      

+2


source


ls = ['N', 'N', 'N', 'C', 'C', 'C', 'C', 'N', 'C', 'C']

def whereMyCharsAt(haystack, needle):
    start = None
    for ii, char in enumerate(haystack):
        if char == needle:
            if start is None:
                start = ii
        else:
            if start is not None:
                yield [needle] * (ii - start), range(start, ii)
                start = None

    if start is not None:
        yield [needle] * (len(haystack) - start), range(start, len(haystack))

for indexes in whereMyCharsAt(ls, 'C'):
    print indexes

      

Prints:

(['C', 'C', 'C', 'C'], [3, 4, 5, 6])
(['C', 'C'], [8, 9])

      

+1


source







All Articles