Pandas outputting aggregate function in xlsx

Question

Pandas outputting aggregate function in xlsx

I have sqlite queries that I turned into pandas dataframes. I passed these dataframes to functions to get aggregated information. How can I populate an Excel spreadsheet using the results of this function? those. how can I turn a function into a data frame? (Note: I am using openpyxl to create a workbook)

Here is the code for df and function:

# Nationwide measure statistics
nationwide_measures = pd.read_sql_query("""select state,
          measure_id,
          measure_name,
          score
from timely_and_effective_care___hospital;""", conn)

# Remove the non-numeric string values from 'score'
nationwide_measures1 = nationwide_measures[nationwide_measures['score'].astype(str).str.isdigit()]

# Change score to numeric
nationwide_measures1['score'] = pd.to_numeric(nationwide_measures1['score'])

# Function to grab measure values
def get_stats(group):
    return {'Minimum': group.min(), 'Maximum': group.max(), 'Average': group.mean(), 'Standard Deviation': group.std()}

# Function output    
nationwide_measures1['score'].groupby(nationwide_measures1['measure_id']).apply(get_stats).unstack()

I tried:

# Function to grab measure values
def get_stats(group):
    return pd.DataFrame({'Minimum': group.min(), 'Maximum': group.max(), 'Average': group.mean(), 'Standard Deviation': group.std()})

but this returns "Value error: if you are using all scalar values, you must pass in the index"

I've also tried:

# Function to grab measure values
def get_stats(group):
    df = pd.DataFrame({'Measure Name': group.columns['measure_name'],'Minimum': group.min(), 'Maximum': group.max(), 'Average': group.mean(), 'Standard Deviation': group.std()}, index = [0])
    return df

But this gives an error: "AttributeError: Object" Series "has no attributes" columns "

+3

function python pandas dataframe

zsad512 Jul 17 17 at 19:14

source to share

1 answer

Scott boston · Accepted Answer · 2017-07-17T20:19:40+0000

In your datafile creation statement, in the pd.DataFrame line, you pass all scalar values and there are no iterations, so if you add index = [0], you get a single dataframe.

pd.DataFrame({'Minimum': group.min(), 'Maximum': group.max(), 'Average': group.mean(), 'Standard Deviation': group.std()},index=[0])

Pandas outputting aggregate function in xlsx

More articles: