Pandas: split categorical column into multiple columns

Imagine a Pandas framework in the following format:

id  type  v1  v2
1   A     6   9
1   B     4   2
2   A     3   7
2   B     3   6

      

I would like to convert this data file to the following format:

id  A_v1  A_v2  B_v1  B_v2
1   6     9     4     2
2   3     7     3     6

      

Is there an elegant way to do this?

+3


source to share


1 answer


You can use set_index

to move columns to type

and id

from the index and then unstack

to move the index level type

to the column index. You don't need to worry about the values v

, where the indices go dictate the location of the values.

The result is a DataFrame with a MultiIndex for the column index:

In [181]: df.set_index(['type', 'id']).unstack(['type'])
Out[181]: 
     v1    v2   
type  A  B  A  B
id              
1     6  4  9  2
2     3  3  7  6

      

In general, MultiIndex is preferred over a smoothed column index. This gives you better ways to select or manipulate your data based on type

or values v

.

If you want to change the order of the columns to exactly match the order shown in the desired output, you can use df.reindex

:

df = df.reindex(columns=sorted(df.columns, key=lambda x: x[::-1]))

      



gives

     v1 v2 v1 v2
type  A  A  B  B
id              
1     6  9  4  2
2     3  7  3  6

      

And if you want to flatten the column index one level then

df.columns = ['{}_{}'.format(t, v) for v,t in df.columns]

      

gives

    A_v1  A_v2  B_v1  B_v2
id                        
1      6     9     4     2
2      3     7     3     6

      

+3


source







All Articles