Pandas: creating MultiIndex / groupby using existing index and column headers
I am trying to take a 2d dataframe and turn it into a (more or less) 1d dataframe where the existing column is used as the first level index and the column headers are used as the second level index.
I tried to .melt()
, .groupby()
, .transpose()
, .MultiIndex()
, .pivot()
... all with no luck - I think it is mainly due to the fact that they should be combined in some way it eludes me.
Setting:
import pandas as pd
from io import StringIO
csv = StringIO(u'''
AXIS A B C D
X 100 101 102 103
Y 200 201 202 203
Z 300 301 302 303
''')
df = pd.read_csv(csv, delim_whitespace = True)
Desired output:
Num
One Two
X A 100
X B 101
X C 102
X D 103
Y A 200
Y B 201
Y C 202
Y D 203
Z A 300
Z B 301
Z C 302
Z D 303
Thanks in advance.
source to share
Use set_index
, stack
and reset_index
:
df.set_index('AXIS').stack().reset_index()
Output:
AXIS level_1 0
0 X A 100
1 X B 101
2 X C 102
3 X D 103
4 Y A 200
5 Y B 201
6 Y C 202
7 Y D 203
8 Z A 300
9 Z B 301
10 Z C 302
11 Z D 303
And you can do some cleanup with the column renaming index, etc ...
df.set_index('AXIS').stack().reset_index().rename(columns={'AXIS':'one','level_1':'two',0:'num'}).set_index(['one','two'])
Output:
num
one two
X A 100
B 101
C 102
D 103
Y A 200
B 201
C 202
D 203
Z A 300
B 301
C 302
D 303
source to share
I like it for speed
i = df.AXIS.values
c = np.array(list('ABCD'))
v = np.column_stack([df[col].values for col in c])
idx = pd.MultiIndex.from_arrays(
[i.repeat(c.size), np.tile(c, i.size)],
names=['One', 'Two']
)
# Or this for brevity
# idx = pd.MultiIndex.from_product([i, c], names=['One', 'Two'])
pd.DataFrame(v.ravel(), idx, ['Num'])
Num
One Two
X A 100
B 101
C 102
D 103
Y A 200
B 201
C 202
D 203
Z A 300
B 301
C 302
D 303
source to share