Getting the average number of rows in a data area greater than or equal to zero

Question

Getting the average number of rows in a data area greater than or equal to zero

I would like to get the average of a row in a dataframe where I only use values greater than or equal to zero.

For example: if my framework looks like this:

df = pd.DataFrame([[3,4,5], [4,5,6],[4,-10,6]])
    3   4   5
    4   5   6
    4   -10 6

currently if i get the average of the line i write:

df['mean'] = df.mean(axis = 1)

and get:

3   4   5   4
4   5   6   5
4   -10 6   0

I would like to get a dataframe that only used values greater than zero to the computer on average. I would like the dataframe to look like this:

3   4   5   4
4   5   6   5
4   -10 6   5

In the example above, -10 is excluded on average. Is there a command that excludes -10?

+3

python pandas

getaglow May 30 '17 at 19:19

source to share

2 answers

Psidom · Answer 1 · 2017-05-30T19:24:02+0000

You can use df[df > 0]

to query a data frame before calculating the average; df[df > 0]

returns a data frame where cells less than or equal to zero will be replaced with NaN

and ignored in the calculation mean

:

df[df > 0].mean(1)

#0    4.0
#1    5.0
#2    5.0
#dtype: float64

piRSquared · Answer 2 · 2017-05-30T19:37:09+0000

Not as easy as @Psidom. But if you want to use numpy

and get some added speed.

v0 = df.values
v1 = np.where(v0 > 0, v0, np.nan)
v2 = np.nanmean(v1, axis=1)
df.assign(Mean=v2)

   0   1  2  Mean
0  3   4  5   4.0
1  4   5  6   5.0
2  4 -10  6   5.0

Timing
small data

%timeit df.assign(Mean=df[df > 0].mean(1))
1000 loops, best of 3: 1.71 ms per loop

%%timeit
v0 = df.values
v1 = np.where(v0 > 0, v0, np.nan)
v2 = np.nanmean(v1, axis=1)
df.assign(Mean=v2)
1000 loops, best of 3: 407 µs per loop

Getting the average number of rows in a data area greater than or equal to zero

More articles: