Average previous and next row for missing value

I am relatively new to R and am facing some problems. I am working with a dataframe that is missing certain values ​​in certain years. For example:

year var1 var2
1972 1.3  1.4
1973 1.6  2.8
1974 2.0  1.5
1975 NA   NA
1976 1.5  2.1
1977 NA   NA
1978 1.9  1.1

      

For each NA, I want to take the average of the previous and next lines. So var1 and var2 in 1975 should be 1.75 and 1.8 respectively. In 1977 they should be 1.7 and 1.6. Any ideas?

+3


source to share


1 answer


You can use na.approx

in a package zoo

:

library(zoo)
df$var1 <- na.approx(df$var1)
df$var2 <- na.approx(df$var2)
##
> df
  year var1 var2
1 1972 1.30  1.4
2 1973 1.60  2.8
3 1974 2.00  1.5
4 1975 1.75  1.8
5 1976 1.50  2.1
6 1977 1.70  1.6
7 1978 1.90  1.1

      

  • As @Jilber pointed out, this can be done more succinctly with

    df <- sapply(df, na.approx)
    
          

  • In @Richard Scriven's comment, you can keep the data.frame

    class with

    df[-1] <- lapply(df[-1], na.approx)
    
          

    or

    df[-1] <- vapply(df[-1], na.approx, numeric(nrow(df))) 
    
          



Data:

df <- read.table(
  text="year var1 var2
1972 1.3  1.4
1973 1.6  2.8
1974 2.0  1.5
1975 NA   NA
1976 1.5  2.1
1977 NA   NA
1978 1.9  1.1",
  header=TRUE)

      

+6


source







All Articles