Python: replacing outliers values โโwith median values
I have a python dataframe that has some outlier values โโin it. I would like to replace them with the median data values, if those values โโweren't there.
id Age
10236 766105
11993 288
9337 205
38189 88
35555 82
39443 75
10762 74
33847 72
21194 70
39450 70
So, I want to replace all values> 75 with the median of the dataset of the remaining dataset, i.e. median value 70,70,72,74,75
.
I am trying to do the following:
- Replace 0, all values โโgreater than 75
- Replace 0s with the median value.
But for some reason below code doesn't work
df['age'].replace(df.age>75,0,inplace=True)
+3
source to share
1 answer
I think this is what you are looking for, you can use loc to assign a value. Then you can fill nan
median = df.loc[df['Age']<75, 'Age'].median()
df.loc[df.Age > 75, 'Age'] = np.nan
df.fillna(median,inplace=True)
You can also use np.where in one line
df["Age"] = np.where(df["Age"] >75, median,df['Age'])
You can also use .mask ie
df["Age"] = df["Age"].mask(df["Age"] >75, median)
+4
source to share