How to shuffle Pandas dataframe rowgroups?

Question

How to shuffle Pandas dataframe rowgroups?

Suppose I have a dataframe df:

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.rand(12,4))

print(df)

     0   1   2   3
0   71  64  84  20
1   48  60  83  61
2   48  78  71  46
3   65  88  66  77
4   71  22  42  58
5   66  76  64  80
6   67  28  74  87
7   32  90  55  78
8   80  42  52  14
9   54  76  73  17
10  32  89  42  36
11  85  78  61  12

How to shuffle df lines three to three, i.e. how do I randomly shuffle the first three rows (0, 1, 2) either with the second (3, 4, 5), third (6), 7, 8) or fourth (9, 10, 11) group? This could be a possible result:

print(df)

     0   1   2   3
3   65  88  66  77
4   71  22  42  58
5   66  76  64  80
9   54  76  73  17
10  32  89  42  36
11  85  78  61  12
6   67  28  74  87
7   32  90  55  78
8   80  42  52  14
0   71  64  84  20
1   48  60  83  61
2   48  78  71  46

So the new order has a second group of 3

rows from the original data frame, then the last, then the third, and finally the first group.

+3

python numpy pandas shuffle

Archie May 24 '17 at 13:11

source to share

3 answers

A similar solution for @Divakar is arguably simpler as I am directly shuffling the dataframe index:

import numpy as np
import pandas as pd

df = pd.DataFrame([np.arange(0, 12)]*4).T
len_group = 3

index_list = np.array(df.index)
np.random.shuffle(np.reshape(index_list, (-1, len_group)))

shuffled_df = df.loc[index_list, :]

Output example:

shuffled_df
    Out[82]: 
     0   1   2   3
9    9   9   9   9
10  10  10  10  10
11  11  11  11  11
3    3   3   3   3
4    4   4   4   4
5    5   5   5   5
0    0   0   0   0
1    1   1   1   1
2    2   2   2   2
6    6   6   6   6
7    7   7   7   7
8    8   8   8   8

+2

FLab May 24 '17 at 14:02

source to share

This does the same as the other two answers, but uses integer division to create the group column.

nrows_df = len(df)
nrows_group = 3

shuffled = (
    df
    .assign(group_var=df.index // nrows_group)
    .set_index("group_var")
    .loc[np.random.permutation(nrows_df / nrows_group)]
)

0

Manje brinkhuis 08 Mar 18 at 16:16

source to share

Divakar · Accepted Answer · 2017-05-24T13:30:40+0000

You can convert to an array 3D

dividing the first axis into two with the last length 3

corresponding to the length of the group, and then use np.random.shuffle

for such a group random shuffle along the first axis that has a length, since the number of groups holds these groups and thus achieves the desired result , eg:

np.random.shuffle(df.values.reshape(-1,3,df.shape[1]))

Explanation

To give it some clarification, use np.random.permutation

to generate these random indices along the first axis and then index into the array version 3D

.

1] Input df:

In [199]: df
Out[199]: 
     0   1   2   3
0   71  64  84  20
1   48  60  83  61
2   48  78  71  46
3   65  88  66  77
4   71  22  42  58
5   66  76  64  80
6   67  28  74  87
7   32  90  55  78
8   80  42  52  14
9   54  76  73  17
10  32  89  42  36
11  85  78  61  12

2] Get the version of the array 3D

:

In [200]: arr_3D = df.values.reshape(-1,3,df.shape[1])

In [201]: arr_3D
Out[201]: 
array([[[71, 64, 84, 20],
        [48, 60, 83, 61],
        [48, 78, 71, 46]],

       [[65, 88, 66, 77],
        [71, 22, 42, 58],
        [66, 76, 64, 80]],

       [[67, 28, 74, 87],
        [32, 90, 55, 78],
        [80, 42, 52, 14]],

       [[54, 76, 73, 17],
        [32, 89, 42, 36],
        [85, 78, 61, 12]]])

3] Get indexes and shuffle indexes on the first axis of the version 3D

:

In [202]: shuffle_idx = np.random.permutation(arr_3D.shape[0])

In [203]: shuffle_idx
Out[203]: array([0, 3, 1, 2])

In [204]: arr_3D[shuffle_idx]
Out[204]: 
array([[[71, 64, 84, 20],
        [48, 60, 83, 61],
        [48, 78, 71, 46]],

       [[54, 76, 73, 17],
        [32, 89, 42, 36],
        [85, 78, 61, 12]],

       [[65, 88, 66, 77],
        [71, 22, 42, 58],
        [66, 76, 64, 80]],

       [[67, 28, 74, 87],
        [32, 90, 55, 78],
        [80, 42, 52, 14]]])

We then assign these values back to the input data frame.

With np.random.shuffle

we just do everything in place and hide the work needed to explicitly create the shuffle and assign back indexes.

Example run -

In [181]: df = pd.DataFrame(np.random.randint(11,99,(12,4)))

In [182]: df
Out[182]: 
     0   1   2   3
0   82  49  80  20
1   19  97  74  81
2   62  20  97  19
3   36  31  14  41
4   27  86  28  58
5   38  68  24  83
6   85  11  25  88
7   21  31  53  19
8   38  45  14  72
9   74  63  40  94
10  69  85  53  81
11  97  96  28  29

In [183]: np.random.shuffle(df.values.reshape(-1,3,df.shape[1]))

In [184]: df
Out[184]: 
     0   1   2   3
0   85  11  25  88
1   21  31  53  19
2   38  45  14  72
3   82  49  80  20
4   19  97  74  81
5   62  20  97  19
6   36  31  14  41
7   27  86  28  58
8   38  68  24  83
9   74  63  40  94
10  69  85  53  81
11  97  96  28  29

How to shuffle Pandas dataframe rowgroups?

More articles: