Pandas: split categorical column into multiple columns
You can use set_index
to move columns to type
and id
from the index and then unstack
to move the index level type
to the column index. You don't need to worry about the values v
, where the indices go dictate the location of the values.
The result is a DataFrame with a MultiIndex for the column index:
In [181]: df.set_index(['type', 'id']).unstack(['type'])
Out[181]:
v1 v2
type A B A B
id
1 6 4 9 2
2 3 3 7 6
In general, MultiIndex is preferred over a smoothed column index. This gives you better ways to select or manipulate your data based on type
or values v
.
If you want to change the order of the columns to exactly match the order shown in the desired output, you can use df.reindex
:
df = df.reindex(columns=sorted(df.columns, key=lambda x: x[::-1]))
gives
v1 v2 v1 v2
type A A B B
id
1 6 9 4 2
2 3 7 3 6
And if you want to flatten the column index one level then
df.columns = ['{}_{}'.format(t, v) for v,t in df.columns]
gives
A_v1 A_v2 B_v1 B_v2
id
1 6 9 4 2
2 3 7 3 6
source to share