Python Pandas ForLoop to create and join database pivot tables
Beginner Pandas / For-Loop Question Here
How do I create a For-Loop to create a pivot table and merge for all columns?
Consider this df (a real dataset is like 50 columns of an estimate, I've simplified it to 4 here):
import numpy as np
import pandas as pd
raw_data = {'Year': [2009, 2009, 2010, 2010, 2010, 2010],
'Quarter': [4, 4, 1, 1, 2, 2],
'Sector': ['GPU', 'GPU', 'Gaming', 'Gaming', 'Gaming', 'Gaming'],
'Ticker': ['NVID', 'NVID', 'ATVI', 'ATVI', 'ATVI', 'ATVI'],
'Metric': ['EPS', 'REV', 'EPS', 'REV', 'EPS', 'REV'],
'Estimate 1': [1.4, 350, 0.2, 500, 0.9, 120],
'Estimate 2': [1.2, 375, 0.22, 505, 1.0, 120],
'Estimate 3': [2.1, 250, 0.2, 510, 0.8, 120],
'Estimate 4': [1.4, 360, 0, 400, 1.9, 125],}
df = pd.DataFrame(raw_data, columns = ['Year','Quarter', 'Sector','Ticker', 'Metric','Estimate 1','Estimate 2','Estimate 3', 'Estimate 4'])
print(df)
Desired output - I am looking for DF like this:
Year Quarter Sector Ticker EPS_1 REV_1 EPS_2 REV_2 EPS_3 REV_3 EPS_4 REV_4
0 2009 4 GPU NVID 1.4 350 1.2 375 2.1 250 1.4 360
1 2010 1 Gaming ATVI 0.2 500 0.22 505 0.2 510 0.0 510
2 2010 2 Gaming ATVI 0.9 120 1.0 120 0.8 120 1.9 120
I can do this individually using pd.pivot()
and pd.merge
, but I'm not sure how to structure this for a loop.
feature_names=('Year','Quarter','Sector','Ticker')
not_feature_names=['Metric','Estimate 1','Estimate 2','Estimate 3', 'Estimate 4']
df_pivot=df.drop(not_feature_names, axis=1)
df_pivot1 = df.pivot_table(index=feature_names,
columns='Metric',
values='Estimate 1',)
df_pivot1 = df_pivot1.reset_index().rename_axis(None, axis=1)
df_pivot1.rename(columns={'EPS': 'EPS_1', 'REV':'REV_1'}, inplace=True)
df_Full=df_pivot1.merge(df_pivot, on=(feature_names), suffixes=('_l', '_r'))
print(df_Full)
Here's where I am with the for loop:
for (name, i) in zip(not_feature_names, range(1, 4)):
df_pivot1 = df.pivot_table(index=feature_names,
columns='Metric',
values=name,)
df_pivot1 = df_pivot1.reset_index().rename_axis(None, axis=1)
df_pivot1.rename(columns={'EPS': ('EPS_'+i), 'REV':('REV_'+i)}, inplace=True)
df_Full=df_pivot1.merge(df_pivot, on=(feature_names), suffixes=('_l', '_r')
+3
source to share
2 answers
A verbatim method would be to melt the score columns and then do some string substitution and concatenation. And finally, bring them back.
df1 = df.melt(id_vars=['Year', 'Quarter', 'Ticker','Metric'],
value_vars=['Estimate 1', 'Estimate 2', 'Estimate 3', 'Estimate 4'])
df1['variable'] = df1.variable.str.replace('Estimate ', '')
df1['Metric'] = df1['Metric'] + '_' + df1['variable']
df1.pivot_table(index=['Year', 'Quarter', 'Ticker'], columns='Metric', values='value').reset_index()
Output
Metric Year Quarter Ticker EPS_1 EPS_2 EPS_3 EPS_4 REV_1 REV_2 REV_3 \
0 2009 4 NVID 1.4 1.20 2.1 1.4 350.0 375.0 250.0
1 2010 1 ATVI 0.2 0.22 0.2 0.0 500.0 505.0 510.0
2 2010 2 ATVI 0.9 1.00 0.8 1.9 120.0 120.0 120.0
Metric REV_4
0 360.0
1 400.0
2 125.0
+1
source to share
I don't think you need to use for-loops, you can use Pandas reshaping:
df_out = df.set_index(['Year','Quarter','Sector','Ticker','Metric']).unstack()
df_out.columns = df_out.columns.get_level_values(1)+'_'+df_out.columns.get_level_values(0).str.split(' ').str[1]
df_out.reset_index()
Output:
Year Quarter Sector Ticker EPS_1 REV_1 EPS_2 REV_2 EPS_3 REV_3 \
0 2009 4 GPU NVID 1.4 350.0 1.20 375.0 2.1 250.0
1 2010 1 Gaming ATVI 0.2 500.0 0.22 505.0 0.2 510.0
2 2010 2 Gaming ATVI 0.9 120.0 1.00 120.0 0.8 120.0
EPS_4 REV_4
0 1.4 360.0
1 0.0 400.0
2 1.9 125.0
+1
source to share