Getting the average number of rows in a data area greater than or equal to zero
I would like to get the average of a row in a dataframe where I only use values โโgreater than or equal to zero.
For example: if my framework looks like this:
df = pd.DataFrame([[3,4,5], [4,5,6],[4,-10,6]])
3 4 5
4 5 6
4 -10 6
currently if i get the average of the line i write:
df['mean'] = df.mean(axis = 1)
and get:
3 4 5 4
4 5 6 5
4 -10 6 0
I would like to get a dataframe that only used values โโgreater than zero to the computer on average. I would like the dataframe to look like this:
3 4 5 4
4 5 6 5
4 -10 6 5
In the example above, -10 is excluded on average. Is there a command that excludes -10?
+3
source to share
2 answers
Not as easy as @Psidom. But if you want to use numpy
and get some added speed.
v0 = df.values
v1 = np.where(v0 > 0, v0, np.nan)
v2 = np.nanmean(v1, axis=1)
df.assign(Mean=v2)
0 1 2 Mean
0 3 4 5 4.0
1 4 5 6 5.0
2 4 -10 6 5.0
Timing
small data
%timeit df.assign(Mean=df[df > 0].mean(1)) 1000 loops, best of 3: 1.71 ms per loop %%timeit v0 = df.values v1 = np.where(v0 > 0, v0, np.nan) v2 = np.nanmean(v1, axis=1) df.assign(Mean=v2) 1000 loops, best of 3: 407 ยตs per loop
+2
source to share