Finding median of whole pandas data frames

I am trying to find the median stream of an entire data frame. The first part of this is to select only certain elements in the data frame.

There were two problems with this: it included parts of the data frame that are not in "states". Also, the median was not a single value, it was based on a string. How do I get the total median of all data in a data frame?

+3


source to share


2 answers


Two options:

1) The pandas variant:

df.stack().median()

      



2) numpy option:

np.median(df.values)

      

+8


source


The DataFrame you attached is slightly confusing due to some whitespace. But you want a melt

Dataframe and then use median()

in a new molten Dataframe:

df2 = pd.melt(df, id_vars =['U.S.'])
print(df2['value'].median())

      

Your Dataframe may be slightly different, but the concept is the same. Check out the comment I left to understand pd.melt()

, especially the arguments value_vars

and id_vars

.



Here's a very verbose way how I went to clean up and get the correct answer:

# reading in on clipboard
df = pd.read_clipboard()

# printing it out to see and also the column names
print(df)
print(df.columns)

# melting the DF and then printing the result
df2 = pd.melt(df, id_vars =['U.S.'])
print(df2)

# Creating a new DF so that no nulls are in there for ease of code readability
# using .copy() to avoid the Pandas warning about working on top of a copy
df3 = df2.dropna().copy()

# there were some funky values in the 'value' column. Just getting rid of those
df3.loc[df3.value.isin(['Columbia', 'of']), 'value'] = 99

# printing out the cleaned version and getting the median
print(df3)
print(df3['value'].median())

      

+2


source







All Articles