Calculate weights for grouped data in pandas
I would like to calculate portfolio weights with pandas framework. Here's some dummy data for example:
df1 = DataFrame({'name' : ['ann','bob']*3}).sort('name').reset_index(drop=True)
df2 = DataFrame({'stock' : list('ABC')*2})
df3 = DataFrame({'val': np.random.randint(10,100,6)})
df = pd.concat([df1, df2, df3], axis=1)
Each person has 3 shares of value val
. We can calculate portfolio weights as follows:
df.groupby('name').apply(lambda x: x.val/(x.val).sum())
which gives the following:
If I want to add a column wgt
to df
, I need to concatenate that result from df
to name
and again index
. It seems pretty awkward.
Is there a way to do this in one step? Or how to do this, which makes the best use of pandas features?
source to share
Use transform
this will return a series with an index aligned to your original df:
In [114]:
df['wgt'] = df.groupby('name')['val'].transform(lambda x: x/x.sum())
df
Out[114]:
name stock val wgt
0 ann A 18 0.131387
1 ann B 43 0.313869
2 ann C 76 0.554745
3 bob A 16 0.142857
4 bob B 44 0.392857
5 bob C 52 0.464286
source to share