Replacing missing value in R with mean
I have a dataframe with data columns with a missing value and I would like to replace the missing value by taking the average using the values โโof the cells above and below.
df1<-c(2,2,NA,10, 20, NA,3)
if(df1[i]== NA){
df1[i]= mean(df1[i+1],df1[i-1])
}
However I am getting this error
Error in if (df1[i] == NA) { : missing value where TRUE/FALSE needed
In addition: Warning message:
In if (df1[i] == NA) { :
the condition has length > 1 and only the first element will be used
Any advice would be appreciated to resolve this issue.
source to share
If you are sure that you have no consecutive NA values, and the first and last elements are never NA, then you can do
df1<-c(2,2,NA,10, 20, NA,3)
idx<-which(is.na(df1))
df1[idx] <- (df1[idx-1] + df1[idx+1])/2
df1
# [1] 2.0 2.0 6.0 10.0 20.0 11.5 3.0
It should be more efficient than a loop.
source to share
You can use na.approx()
from package zoo
to replace NA
with interpolated values:
library(zoo)
> na.approx(df1)
# [1] 2.0 2.0 6.0 10.0 20.0 11.5 3.0
As @ G. Grothendieck mentioned, this will fill NA
in if there are multiple in the line NA
. Also, if there may be at the ends NA
, then adding an argument will na.rm = FALSE
keep them, or adding rule = 2
will replace them with the first or last not NA
.
source to share
to check what NA is used is.na()
, create a loop and give a mean()
vector as an argument, otherwise it will only see the first value. This should work if you don't have consecutive NA's and the first and last entries are not NA:
df1<-c(2,2,NA,10, 20, NA,3)
for(i in 2:(length(df1)-1)){
if(is.na(df1[i])){
df1[i]= mean(c(df1[i+1],df1[i-1]))
}
}
source to share