Reverse / reverse na.approx

I have a date vector with leading neural networks and I would like to create an approximate sequence for these neural networks using na.approx

from a package zoo

.

na.approx

doesn't work for leading NA:

x <- as.Date(c(rep(NA,3),"1992-01-16","1992-04-16","1992-07-16",
"1992-10-16","1993-01-15","1993-04-16","1993-07-17"))
as.Date(na.approx(x,na.rm=FALSE))

[1] NA           NA           NA           "1992-01-16" "1992-04-16" 
1992-07-16" "1992-10-16" "1993-01-15" "1993-04-16" "1993-07-17"

      

I thought I could change the vector with rev

, but I still get NAs

as.Date(na.approx(rev(x),na.rm=FALSE))

 [1] "1993-07-17" "1993-04-16" "1993-01-15" "1992-10-16" "1992-07-16" 
"1992-04-16" "1992-01-16" NA           NA           NA   

      

Any ideas?

+3


source to share


2 answers


Found my answer. na.spline

does a good job with a lot of data. In the example above, I have multiple dates that cause it to drift closer. However, there is no drift in my real life example.



as.Date(na.spline(x,na.rm=FALSE))
 [1] "1993-07-17" "1993-04-16" "1993-01-15" "1992-10-16" "1992-07-16" 
"1992-04-16" "1992-01-16" "1991-10-15" "1991-07-13" "1991-04-06"

      

+1


source


na.approx

needs to be passed rule

for values ​​outside of values min

or max

your vector. If used rule=2

, missing values ​​will be imputed to the nearest value.

as.Date(na.approx(x,na.rm=FALSE, rule=2))
# [1] "1992-01-16" "1992-01-16" "1992-01-16" "1992-01-16" "1992-04-16" "1992-07-16" "1992-10-16" "1993-01-15"
#  [9] "1993-04-16" "1993-07-17"

      



Alternatively, you can use na.spline

(as in your answer). You mentioned that it can get a little wild so you can write a function to assign values ​​based on the time difference between your measures. I am using the first difference not missing here.

add_leading_seq_dates <- function(x) {
        first_non_missing = which.min(is.na(x))
        first_day_diff = na.omit(diff(x))[1]
        no_of_leadng_missing = first_non_missing - 1
        input_dates = x[first_non_missing] - cumsum(rep(first_day_diff, no_of_leadng_missing)) 
        x[is.na(x)] = rev(input_dates)
        x
}

add_leading_seq_dates(x)

# [1] "1991-04-18" "1991-07-18" "1991-10-17" "1992-01-16" "1992-04-16"
# [6] "1992-07-16" "1992-10-16" "1993-01-15" "1993-04-16" "1993-07-17"

diff(add_leading_seq_dates(x))
# Time differences in days
# [1] 91 91 91 91 91 92 91 91 92

      

+2


source







All Articles