Slice pandas dataframe in groups of sequential values

Question

Slice pandas dataframe in groups of sequential values

I have a block of data containing sections of consecutive values that end up "skipping" (ie increasing by more than 1). I would like to split a dataframe like a function groupby

(show-only alphabetical indexing):

    A
a   1
b   2
c   3
d   6
e   7
f   8
g   11
h   12
i   13

# would return

a   1
b   2
c   3
-----
d   6
e   7
f   8
-----
g   11
h   12
i   13

+3

python pandas slice

heltonbiker Sep 30 14 at 13:06

source to share

3 answers

My two cents is just for fun.

In [15]:

for grp, val in df.groupby((df.diff()-1).fillna(0).cumsum().A):
    print val
   A
a  1
b  2
c  3
   A
d  6
e  7
f  8
    A
g  11
h  12
i  13

+1

CT Zhu Sep 30 14 at 14:52

source to share

We can use shift

for comparison if the difference between the lines is greater than 1, and then build a list of tuple pairs of the required indices:

In [128]:
# list comprehension of the indices where the value difference is larger than 1, have to add the first row index also
index_list = [df.iloc[0].name] + list(df[(df.value - df.value.shift()) > 1].index)
index_list
Out[128]:
['a', 'd', 'g']

we have to build a list of root pairs of the ranges we are interested in, note that in pandas, the start and end index values are included, so we need to find the label for the previous row for the ending range label:

In [170]:

final_range=[]
for i in range(len(index_list)):
    # handle last range value
    if i == len(index_list) -1:
        final_range.append((index_list[i], df.iloc[-1].name ))
    else:
        final_range.append( (index_list[i], df.iloc[ np.searchsorted(df.index, df.loc[index_list[i + 1]].name) -1].name))

final_range

Out[170]:
[('a', 'c'), ('d', 'f'), ('g', 'i')]

I am using numpy searchsorted to find the index value (integer based) where we can insert our value and then subtract 1 from that to get the index mark value of the previous row

In [171]:
# now print
for r in final_range:
    print(df[r[0]:r[1]])
       value
index       
a          1
b          2
c          3
       value
index       
d          6
e          7
f          8
       value
index       
g         11
h         12
i         13

0

EdChum Sep 30 14 at 13:21

source to share

ZJS · Accepted Answer · 2014-09-30T14:58:03+0000

Slightly improved response speed ...

for k,g in df.groupby(df['A'] - np.arange(df.shape[0])):
    print g

Slice pandas dataframe in groups of sequential values

More articles: