Python Pandas ForLoop to create and join database pivot tables

Question

Python Pandas ForLoop to create and join database pivot tables

Beginner Pandas / For-Loop Question Here

How do I create a For-Loop to create a pivot table and merge for all columns?

Consider this df (a real dataset is like 50 columns of an estimate, I've simplified it to 4 here):

import numpy as np
import pandas as pd
raw_data = {'Year': [2009, 2009, 2010, 2010, 2010, 2010],
    'Quarter': [4, 4, 1, 1, 2, 2],       
    'Sector': ['GPU', 'GPU', 'Gaming', 'Gaming', 'Gaming', 'Gaming'],
    'Ticker': ['NVID', 'NVID', 'ATVI', 'ATVI', 'ATVI', 'ATVI'],
    'Metric': ['EPS', 'REV', 'EPS', 'REV', 'EPS', 'REV'],
    'Estimate 1': [1.4, 350, 0.2, 500, 0.9, 120],
    'Estimate 2': [1.2, 375, 0.22, 505, 1.0, 120],
    'Estimate 3': [2.1, 250, 0.2, 510, 0.8, 120],
    'Estimate 4': [1.4, 360, 0, 400, 1.9, 125],} 
df = pd.DataFrame(raw_data, columns = ['Year','Quarter', 'Sector','Ticker', 'Metric','Estimate 1','Estimate 2','Estimate 3', 'Estimate 4'])
print(df)

Desired output - I am looking for DF like this:

    Year  Quarter  Sector  Ticker   EPS_1   REV_1   EPS_2   REV_2   EPS_3   REV_3    EPS_4   REV_4
0  2009      4       GPU   NVID      1.4     350     1.2     375     2.1     250      1.4     360
1  2010      1    Gaming   ATVI      0.2     500    0.22     505     0.2     510      0.0     510
2  2010      2    Gaming   ATVI      0.9     120     1.0     120     0.8     120      1.9     120

I can do this individually using pd.pivot()

and pd.merge

, but I'm not sure how to structure this for a loop.

feature_names=('Year','Quarter','Sector','Ticker')
not_feature_names=['Metric','Estimate 1','Estimate 2','Estimate 3', 'Estimate 4']
df_pivot=df.drop(not_feature_names, axis=1)

df_pivot1 = df.pivot_table(index=feature_names,
                         columns='Metric', 
                         values='Estimate 1',)
df_pivot1 = df_pivot1.reset_index().rename_axis(None, axis=1)
df_pivot1.rename(columns={'EPS': 'EPS_1', 'REV':'REV_1'}, inplace=True)
df_Full=df_pivot1.merge(df_pivot, on=(feature_names), suffixes=('_l', '_r'))
print(df_Full)

Here's where I am with the for loop:

for (name, i) in zip(not_feature_names, range(1, 4)):
    df_pivot1 = df.pivot_table(index=feature_names,
                         columns='Metric', 
                         values=name,)
    df_pivot1 = df_pivot1.reset_index().rename_axis(None, axis=1)
    df_pivot1.rename(columns={'EPS': ('EPS_'+i), 'REV':('REV_'+i)}, inplace=True)
    df_Full=df_pivot1.merge(df_pivot, on=(feature_names), suffixes=('_l', '_r')

+3

python pandas for-loop dataframe pivot

whs2k June 19 17 at 18:40

source to share

2 answers

I don't think you need to use for-loops, you can use Pandas reshaping:

df_out = df.set_index(['Year','Quarter','Sector','Ticker','Metric']).unstack()
df_out.columns = df_out.columns.get_level_values(1)+'_'+df_out.columns.get_level_values(0).str.split(' ').str[1]
df_out.reset_index()

Output:

   Year  Quarter  Sector Ticker  EPS_1  REV_1  EPS_2  REV_2  EPS_3  REV_3  \
0  2009        4     GPU   NVID    1.4  350.0   1.20  375.0    2.1  250.0   
1  2010        1  Gaming   ATVI    0.2  500.0   0.22  505.0    0.2  510.0   
2  2010        2  Gaming   ATVI    0.9  120.0   1.00  120.0    0.8  120.0   

   EPS_4  REV_4  
0    1.4  360.0  
1    0.0  400.0  
2    1.9  125.0

+1

Scott boston June 19 17 at 18:46

source to share

Ted petrou · Accepted Answer · 2017-06-19T18:53:10+0000

A verbatim method would be to melt the score columns and then do some string substitution and concatenation. And finally, bring them back.

df1 = df.melt(id_vars=['Year', 'Quarter', 'Ticker','Metric'],
        value_vars=['Estimate 1', 'Estimate 2', 'Estimate 3', 'Estimate 4'])

df1['variable'] = df1.variable.str.replace('Estimate ', '')
df1['Metric'] = df1['Metric'] + '_' + df1['variable']
df1.pivot_table(index=['Year', 'Quarter', 'Ticker'], columns='Metric', values='value').reset_index()

Output

Metric  Year  Quarter Ticker  EPS_1  EPS_2  EPS_3  EPS_4  REV_1  REV_2  REV_3  \
0       2009        4   NVID    1.4   1.20    2.1    1.4  350.0  375.0  250.0   
1       2010        1   ATVI    0.2   0.22    0.2    0.0  500.0  505.0  510.0   
2       2010        2   ATVI    0.9   1.00    0.8    1.9  120.0  120.0  120.0   

Metric  REV_4  
0       360.0  
1       400.0  
2       125.0

Python Pandas ForLoop to create and join database pivot tables

How do I create a For-Loop to create a pivot table and merge for all columns?

More articles: