Pandas multiindex sort

Question

Pandas multiindex sort

In Pandas 0.19, I have a large frame with a multi-index of the following kind

          C0     C1     C2
A   B
bar one   4      2      4
    two   1      3      2
foo one   9      7      1
    two   2      1      3

I want to sort bar and foo (and many other double strings like theirs) according to "two" to get this:

          C0     C1     C2
A   B
bar one   4      4      2
    two   1      2      3
foo one   7      9      1
    two   1      2      3

I'm interested in speed (since I have many columns and many row pairs). I'm also happy with reinstalling the data if it makes sorting faster. Many thanks

+3

sorting pandas multi-index

hoelder 04 Apr 17 at 14:50

source to share

2 answers

Here's a solution, though klugdy:

Input data frame:

         C0  C1  C2
A   B              
bar one   4   2   4
    two   1   3   2
foo one   9   7   1
    two   2   1   3

Custom sort function:

def sortit(x):
    xcolumns = x.columns.values
    x.index = x.index.droplevel()
    x.sort_values(by='two',axis=1,inplace=True)
    x.columns = xcolumns
    return x

df.groupby(level=0).apply(sortit)

Output:

         C0  C1  C2
A   B              
bar one   4   4   2
    two   1   2   3
foo one   7   9   1
    two   1   2   3

+2

Scott boston 04 Apr 17 at 16:30

source to share

Ted petrou · Accepted Answer · 2017-04-05T04:23:00+0000

This is basically a layered solution that should provide good performance. First, it selects only "two" lines and argues them. It then sets this order for each line of the source frame. It then unravels that order (after adding a constant to offset each row) and the original frame values. It then reorders all of the original values based on this unwrapped, biased and argsorted array before creating a new dataframe with the expected sort order.

rows, cols = df.shape
df_a = np.argsort(df.xs('two', level=1))
order = df_a.reindex(df.index.droplevel(-1)).values
offset = np.arange(len(df)) * cols
order_final = order + offset[:, np.newaxis]
pd.DataFrame(df.values.ravel()[order_final.ravel()].reshape(rows, cols), index=df.index, columns=df.columns)

Output

         C0  C1  C2
A   B              
bar one   4   4   2
    two   1   2   3
foo one   7   9   1
    two   1   2   3

Some speed tests

# create much larger frame
import string
idx = pd.MultiIndex.from_product((list(string.ascii_letters), list(string.ascii_letters) + ['two']))
df1 = pd.DataFrame(index=idx, data=np.random.rand(len(idx), 3), columns=['C0', 'C1', 'C2'])

#scott boston
%timeit df1.groupby(level=0).apply(sortit)
10 loops, best of 3: 199 ms per loop

#Ted
1000 loops, best of 3: 5 ms per loop

Pandas multiindex sort

More articles: