Pandas frame reshaping (tricky case!)
I want to change the following dataframe:
index id numbers 1111 5 58.99 2222 5 75.65 1000 4 66.54 11 4 60.33 143 4 62.31 145 51 30.2 1 7 61.28
The modified data frame should look like this:
id 1 2 3 5 58.99 75.65 nan 4 66.54 60.33 62.31 51 30.2 nan nan 7 61.28 nan nan
For this I am using the following code.
import pandas as pd
dtFrame = pd.read_csv("data.csv")
ids = dtFrame['id'].unique()
temp = dtFrame.groupby(['id'])
temp2 = {}
for i in ids:
temp2[i]= temp.get_group(i).reset_index()['numbers']
dtFrame = pd.DataFrame.from_dict(temp2)
dtFrame = dtFrame.T
Although the above code solves my problem, there is an easier way to achieve this. I tried the Pivot table but that doesn't solve the problem, maybe it requires having the same number of items in each group. Or maybe there is another way that I am not aware of, please share your thoughts on this.
source to share
In [69]: df.groupby(df['id'])['numbers'].apply(lambda x: pd.Series(x.values)).unstack()
Out[69]:
0 1 2
id
4 66.54 60.33 62.31
5 58.99 75.65 NaN
7 61.28 NaN NaN
51 30.20 NaN NaN
This is actually very similar to what you are doing, except that the loop is replaced with apply
. pd.Series(x.values)
has an index that by default spans integers starting with 0
. The index values ββbecome the column names (see above). It doesn't matter that different groups may have different lengths. The method apply
aligns the various indices for you (and fills in the missing values ββwith NaN
). What a convenience!
I learned this trick here .
source to share