How to sort a pivot table in Pandas

Here is the code:

test = pd.DataFrame({'country':['us','ca','ru','cn','ru','cn','us','ca','ru','cn','us','ca','ru','cn','us','ca'], 'month':[5,6,7,5,6,7,5,5,6,7,5,6,6,5,5,6], 'id':[x for x in range(16)]})
p = test.pivot_table(index=['month', 'country'], aggfunc='count')[['id']]

      

The result looks like this:

enter image description here

I would like to sort the table in a column id

so that the largest number appears at the top, for example:

                    id
month    country
           us       4
  5        cn       2
           ca       1

      

+3


source to share


2 answers


You need DataFrame.reset_index

, DataFrame.sort_values

and DataFrame.set_index

:

p1 = p.reset_index()
      .sort_values(['month','id'], ascending=[1,0])
      .set_index(['month','country'])
print (p1)
               id
month country    
5     us        4
      cn        2
      ca        1
6     ca        3
      ru        3
7     cn        2
      ru        1

      



because this solution doesn't work :(

p1 = p.sort_index(level='month', sort_remaining=True) \
      .sort_values('id', ascending=False)
print (p1)
               id
month country    
5     us        4
6     ca        3
      ru        3
5     cn        2
7     cn        2
5     ca        1
7     ru        1

      

+2


source


Option 1
This type is sorted id

within groups defined by the level month

in the index

p.groupby(
    level='month', group_keys=False
).apply(pd.DataFrame.sort_values, by='id', ascending=False)

               id
month country    
5     us        4
      cn        2
      ca        1
6     ca        3
      ru        3
7     cn        2
      ru        1

      


Option 2
First it sorts the entire dataframe by id

, then sorts again at the level month

within the index. However, I had to use sort_remaining=False

to explain the reasons for and kind='mergesort'

, because it mergesort

is a stable type and will not interfere with the pre-existing order in the groups defined by the "month" level.



p.sort_values('id', ascending=False) \
 .sort_index(level='month', sort_remaining=False, kind='mergesort')

               id
month country    
5     us        4
      cn        2
      ca        1
6     ca        3
      ru        3
7     cn        2
      ru        1

      


Option 3
This uses numpy lexsort

... it works, but I don't like it because it depends on id

which is numeric and I can put a negative character in front of it to get a descending order. / Hands up

p.iloc[np.lexsort([-p.id.values, p.index.get_level_values('month')])]

               id
month country    
5     us        4
      cn        2
      ca        1
6     ca        3
      ru        3
7     cn        2
      ru        1

      

+1


source







All Articles