Average previous and next row for missing value

Question

Average previous and next row for missing value

I am relatively new to R and am facing some problems. I am working with a dataframe that is missing certain values in certain years. For example:

year var1 var2
1972 1.3  1.4
1973 1.6  2.8
1974 2.0  1.5
1975 NA   NA
1976 1.5  2.1
1977 NA   NA
1978 1.9  1.1

For each NA, I want to take the average of the previous and next lines. So var1 and var2 in 1975 should be 1.75 and 1.8 respectively. In 1977 they should be 1.7 and 1.6. Any ideas?

+3

r missing-data mean

Alex 28 oct. '14 at 15:19

source to share

1 answer

nrussell · Accepted Answer · 2014-10-28T15:25:35+0000

You can use na.approx

in a package zoo

:

library(zoo)
df$var1 <- na.approx(df$var1)
df$var2 <- na.approx(df$var2)
##
> df
  year var1 var2
1 1972 1.30  1.4
2 1973 1.60  2.8
3 1974 2.00  1.5
4 1975 1.75  1.8
5 1976 1.50  2.1
6 1977 1.70  1.6
7 1978 1.90  1.1

As @Jilber pointed out, this can be done more succinctly with

df <- sapply(df, na.approx)

In @Richard Scriven's comment, you can keep the data.frame

class with

df[-1] <- lapply(df[-1], na.approx)

or

df[-1] <- vapply(df[-1], na.approx, numeric(nrow(df)))

Data:

df <- read.table(
  text="year var1 var2
1972 1.3  1.4
1973 1.6  2.8
1974 2.0  1.5
1975 NA   NA
1976 1.5  2.1
1977 NA   NA
1978 1.9  1.1",
  header=TRUE)

Average previous and next row for missing value

More articles: