Average previous and next row for missing value
I am relatively new to R and am facing some problems. I am working with a dataframe that is missing certain values ββin certain years. For example:
year var1 var2
1972 1.3 1.4
1973 1.6 2.8
1974 2.0 1.5
1975 NA NA
1976 1.5 2.1
1977 NA NA
1978 1.9 1.1
For each NA, I want to take the average of the previous and next lines. So var1 and var2 in 1975 should be 1.75 and 1.8 respectively. In 1977 they should be 1.7 and 1.6. Any ideas?
+3
source to share
1 answer
You can use na.approx
in a package zoo
:
library(zoo)
df$var1 <- na.approx(df$var1)
df$var2 <- na.approx(df$var2)
##
> df
year var1 var2
1 1972 1.30 1.4
2 1973 1.60 2.8
3 1974 2.00 1.5
4 1975 1.75 1.8
5 1976 1.50 2.1
6 1977 1.70 1.6
7 1978 1.90 1.1
-
As @Jilber pointed out, this can be done more succinctly with
df <- sapply(df, na.approx)
-
In @Richard Scriven's comment, you can keep the
data.frame
class withdf[-1] <- lapply(df[-1], na.approx)
or
df[-1] <- vapply(df[-1], na.approx, numeric(nrow(df)))
Data:
df <- read.table(
text="year var1 var2
1972 1.3 1.4
1973 1.6 2.8
1974 2.0 1.5
1975 NA NA
1976 1.5 2.1
1977 NA NA
1978 1.9 1.1",
header=TRUE)
+6
source to share