Pandas frame reshaping (tricky case!)

I want to change the following dataframe:

index id numbers
1111 5 58.99
2222 5 75.65
1000 4 66.54 
11 4 60.33
143 4 62.31
145 51 30.2
1 7 61.28

The modified data frame should look like this:

id 1 2 3 
5 58.99 75.65 nan
4 66.54 60.33 62.31
51 30.2 nan nan
7 61.28 nan nan

For this I am using the following code.

import pandas as pd
dtFrame = pd.read_csv("data.csv")
ids = dtFrame['id'].unique()
temp = dtFrame.groupby(['id'])
temp2 = {}
for i in ids:
    temp2[i]= temp.get_group(i).reset_index()['numbers'] 
dtFrame = pd.DataFrame.from_dict(temp2)
dtFrame = dtFrame.T

      

Although the above code solves my problem, there is an easier way to achieve this. I tried the Pivot table but that doesn't solve the problem, maybe it requires having the same number of items in each group. Or maybe there is another way that I am not aware of, please share your thoughts on this.

+3


source to share


1 answer


In [69]: df.groupby(df['id'])['numbers'].apply(lambda x: pd.Series(x.values)).unstack()
Out[69]: 
        0      1      2
id                     
4   66.54  60.33  62.31
5   58.99  75.65    NaN
7   61.28    NaN    NaN
51  30.20    NaN    NaN

      

This is actually very similar to what you are doing, except that the loop is replaced with apply

. pd.Series(x.values)

has an index that by default spans integers starting with 0

. The index values ​​become the column names (see above). It doesn't matter that different groups may have different lengths. The method apply

aligns the various indices for you (and fills in the missing values ​​with NaN

). What a convenience!



I learned this trick here .

+2


source







All Articles