Pandas multiindex sort
In Pandas 0.19, I have a large frame with a multi-index of the following kind
C0 C1 C2
A B
bar one 4 2 4
two 1 3 2
foo one 9 7 1
two 2 1 3
I want to sort bar and foo (and many other double strings like theirs) according to "two" to get this:
C0 C1 C2
A B
bar one 4 4 2
two 1 2 3
foo one 7 9 1
two 1 2 3
I'm interested in speed (since I have many columns and many row pairs). I'm also happy with reinstalling the data if it makes sorting faster. Many thanks
source to share
This is basically a layered solution that should provide good performance. First, it selects only "two" lines and argues them. It then sets this order for each line of the source frame. It then unravels that order (after adding a constant to offset each row) and the original frame values. It then reorders all of the original values based on this unwrapped, biased and argsorted array before creating a new dataframe with the expected sort order.
rows, cols = df.shape df_a = np.argsort(df.xs('two', level=1)) order = df_a.reindex(df.index.droplevel(-1)).values offset = np.arange(len(df)) * cols order_final = order + offset[:, np.newaxis] pd.DataFrame(df.values.ravel()[order_final.ravel()].reshape(rows, cols), index=df.index, columns=df.columns)
Output
C0 C1 C2
A B
bar one 4 4 2
two 1 2 3
foo one 7 9 1
two 1 2 3
Some speed tests
# create much larger frame
import string
idx = pd.MultiIndex.from_product((list(string.ascii_letters), list(string.ascii_letters) + ['two']))
df1 = pd.DataFrame(index=idx, data=np.random.rand(len(idx), 3), columns=['C0', 'C1', 'C2'])
#scott boston
%timeit df1.groupby(level=0).apply(sortit)
10 loops, best of 3: 199 ms per loop
#Ted
1000 loops, best of 3: 5 ms per loop
source to share
Here's a solution, though klugdy:
Input data frame:
C0 C1 C2
A B
bar one 4 2 4
two 1 3 2
foo one 9 7 1
two 2 1 3
Custom sort function:
def sortit(x):
xcolumns = x.columns.values
x.index = x.index.droplevel()
x.sort_values(by='two',axis=1,inplace=True)
x.columns = xcolumns
return x
df.groupby(level=0).apply(sortit)
Output:
C0 C1 C2
A B
bar one 4 4 2
two 1 2 3
foo one 7 9 1
two 1 2 3
source to share