Python: replacing outliers values โ€‹โ€‹with median values

I have a python dataframe that has some outlier values โ€‹โ€‹in it. I would like to replace them with the median data values, if those values โ€‹โ€‹weren't there.

id         Age
10236    766105
11993       288
9337        205
38189        88
35555        82
39443        75
10762        74
33847        72
21194        70
39450        70

      

So, I want to replace all values> 75 with the median of the dataset of the remaining dataset, i.e. median value 70,70,72,74,75

.

I am trying to do the following:

  • Replace 0, all values โ€‹โ€‹greater than 75
  • Replace 0s with the median value.

But for some reason below code doesn't work

df['age'].replace(df.age>75,0,inplace=True)

      

+3


source to share


1 answer


I think this is what you are looking for, you can use loc to assign a value. Then you can fill nan

median = df.loc[df['Age']<75, 'Age'].median()
df.loc[df.Age > 75, 'Age'] = np.nan
df.fillna(median,inplace=True)

      

You can also use np.where in one line



df["Age"] = np.where(df["Age"] >75, median,df['Age'])

      

You can also use .mask ie

df["Age"] = df["Age"].mask(df["Age"] >75, median)

      

+4


source







All Articles