Normalize DataFrame by Groups

Let's say that I have some data generated like this:

N = 20
m = 3
data = np.random.normal(size=(N,m)) + np.random.normal(size=(N,m))**3

      

and then I create a categorization variable:

indx = np.random.randint(0,3,size=N).astype(np.int32)

      

and create a DataFrame:

import pandas as pd
df = pd.DataFrame(np.hstack((data, indx[:,None])), 
             columns=['a%s' % k for k in range(m)] + [ 'indx'])

      

I can get the average for each group as:

df.groubpy('indx').mean()

      

What I'm not sure how to do is then subtract the average for each group, for each column in the original data, so that the data in each column is normalized to the average within the group. Any suggestions would be appreciated.

+3


source to share


3 answers


In [10]: df.groupby('indx').transform(lambda x: (x - x.mean()) / x.std())

      



should do it.

+15


source


While not the nicest solution, you can do something like this:



indx = df['indx'].copy()
for indices in df.groupby('indx').groups.values():
    df.loc[indices] -= df.loc[indices].mean()
df['indx'] = indx

      

+1


source


The accepted answer works and is elegant. Unfortunately for large datasets, I think using using .transform () is much slower than doing the less elegant one (illustrated by the single column "a0"):

means_stds = df.groupby('indx')['a0'].agg(['mean','std']).reset_index()
df = df.merge(means_stds,on='indx')
df['a0_normalized'] = (df['a0'] - df['mean']) / df['std']

      

To do this for multiple columns, you need to figure out the merge. My suggestion was to flatten the multiindex columns from the aggregation like in this answer , and then combine and normalize for each column separately:

means_stds = df.groupby('indx')[['a0','a1']].agg(['mean','std']).reset_index()
means_stds.columns = ['%s%s' % (a, '|%s' % b if b else '') for a, b in means_stds.columns]
df = df.merge(means_stds,on='indx')
for col in ['a0','a1']:
    df[col+'_normalized'] = ( df[col] - df[col+'|mean'] ) / df[col+'|std']

      

0


source







All Articles