Pandas: creating MultiIndex / groupby using existing index and column headers

I am trying to take a 2d dataframe and turn it into a (more or less) 1d dataframe where the existing column is used as the first level index and the column headers are used as the second level index.

I tried to .melt()

, .groupby()

, .transpose()

, .MultiIndex()

, .pivot()

... all with no luck - I think it is mainly due to the fact that they should be combined in some way it eludes me.

Setting:

import pandas as pd
from io import StringIO

csv = StringIO(u'''
AXIS    A       B       C       D
X       100     101     102     103 
Y       200     201     202     203
Z       300     301     302     303
''')

df = pd.read_csv(csv, delim_whitespace = True)

      

Desired output:

                Num
One     Two     
  X       A     100
  X       B     101
  X       C     102
  X       D     103
  Y       A     200
  Y       B     201
  Y       C     202
  Y       D     203
  Z       A     300
  Z       B     301
  Z       C     302
  Z       D     303

      

Thanks in advance.

+3


source to share


4 answers


Use set_index

, stack

and reset_index

:

df.set_index('AXIS').stack().reset_index()

      

Output:

   AXIS level_1    0
0     X       A  100
1     X       B  101
2     X       C  102
3     X       D  103
4     Y       A  200
5     Y       B  201
6     Y       C  202
7     Y       D  203
8     Z       A  300
9     Z       B  301
10    Z       C  302
11    Z       D  303

      



And you can do some cleanup with the column renaming index, etc ...

df.set_index('AXIS').stack().reset_index().rename(columns={'AXIS':'one','level_1':'two',0:'num'}).set_index(['one','two'])

      

Output:

         num
one two     
X   A    100
    B    101
    C    102
    D    103
Y   A    200
    B    201
    C    202
    D    203
Z   A    300
    B    301
    C    302
    D    303

      

+2


source


I like it for speed



i = df.AXIS.values
c = np.array(list('ABCD'))
v = np.column_stack([df[col].values for col in c])
idx = pd.MultiIndex.from_arrays(
    [i.repeat(c.size), np.tile(c, i.size)],
    names=['One', 'Two']
)
# Or this for brevity
# idx = pd.MultiIndex.from_product([i, c], names=['One', 'Two'])
pd.DataFrame(v.ravel(), idx, ['Num'])

         Num
One Two     
X   A    100
    B    101
    C    102
    D    103
Y   A    200
    B    201
    C    202
    D    203
Z   A    300
    B    301
    C    302
    D    303

      

+3


source


#Using pd.melt to convert columns to rows.
pd.melt(df.rename(columns={'AXIS':'ONE'}),id_vars='ONE', var_name='TWO', value_name='Num').set_index(['ONE','TWO']).sort_index()
Out[28]: 
         Num
ONE TWO     
X   A    100
    B    101
    C    102
    D    103
Y   A    200
    B    201
    C    202
    D    203
Z   A    300
    B    301
    C    302
    D    303

      

+2


source


As you suspected, the trick is to combine the correct id and value variables

 pd.melt(df, id_vars=['AXIS'], value_vars=['A', 'B', 'C', 'D']).sort_values(['AXIS'])

      

+1


source







All Articles