Return the closest date at a given date in R

My data frame consists of individual observations of individual animals. Each animal has a date of birth, which I would like to associate with the closest field date of the season from the date vector.

Here's a very simple reproducible example:

ID <- c("a", "b", "c", "d", "a") # individual "a" is measured twice here
birthdate <- as.Date(c("2012-06-12", "2014-06-14", "2015-11-11", "2016-09-30", "2012-06-12"))    
df <- data.frame(ID, birthdate)

# This is the date vector
season_enddates <- as.Date(c("2011-11-10", "2012-11-28", "2013-11-29", "2014-11-26", "2015-11-16", "2016-11-22", "2012-06-21", "2013-06-23", "2014-06-25", "2015-06-08", "2016-06-14"))

      

With the following code, I can get the difference between the date of birth and the closest season.

for(i in 1:length(df$birthdate)){
  df$birthseason[i] <- which(abs(season_enddates-df$birthdate[i]) == min(abs(season_enddates-df$birthdate[i])))
}

      

However, I want it to be the actual date, not the difference. For example, the first birthseason should be 2012-06-21.

+3


source to share


5 answers


This is a little confusing as you are using variables that you did not include in your examples.

But I think this is what you want:

for (ii in 1:nrow(df))  df$birthseason[ii] <-as.character(season_enddates[which.min(abs(df$birthdate[ii] - season_enddates))])

      

Alternatively, using lapply

:



df$birthseason <- unlist(lapply(df$birthdate,function(x) as.character(season_enddates[which.min(abs(x - season_enddates))])))

      

Result:

> df
  ID  birthdate birthseason
1  a 2012-06-12  2012-06-21
2  b 2014-06-14  2014-06-25
3  c 2015-11-11  2015-11-16
4  d 2016-09-30  2016-11-22
5  a 2012-06-12  2012-06-21

      

+2


source


You are looking for, which season_enddate

is closest to birthdate[1]

, and birthdate[2]

etc.

To get the data straight, I'll create a real reproducible example:

birthdate <- as.Date(c("2012-06-12", "2014-06-14", 
                       "2015-11-11", "2016-09-30", 
                       "2012-06-12"))

season_enddates <- as.Date(c("2011-11-10", "2012-11-28", 
                             "2013-11-29", "2014-11-26",
                             "2015-11-16", "2016-11-22", 
                             "2012-06-21", "2013-06-23", 
                             "2014-06-25", "2015-06-08", 
                             "2016-06-14"))

      



I mostly use the function you used, except I decided to break it down a bit, so it's easier for you to follow what you are trying to do:

new.vector <- rep(0, length(birthdate))
for(i in 1:length(birthdate)){
    diffs <- abs(birthdate[i] - season_enddates)
    inds  <- which.min(diffs)
    new.vector[i] <- season_enddates[inds]
}

# new.vector now contains some dates that have been converted to numbers:
as.Date(new.vector, origin = "1970-01-01")
# [1] "2012-06-21" "2014-06-25" "2015-11-16" "2016-11-22"
# [5] "2012-06-21"

      

+2


source


I suggested some changes for your question so that your example code creates all the variables needed to reproduce your problem. Please take a look and check that I understand your problem.

To solve this problem, I suggest using which.min

(your code is a little simpler and faster) in combination with a subset of your vector season_enddates

as shown below:

for(i in 1:length(younger$HatchCalendarYear)){
  df$birthseasonDate[i] <- season_enddates[which.min(abs(season_enddates - df$birthdate[i]))]
}

      

+1


source


All solutions here are essentially the same. If you want you to have an optimized function for this operation, I would do this:

match_season <- function(x,y){
  nx <- length(x)
  ind <- numeric(nx)
  for(i in seq_len(nx)){
    ind[i] <- which.min(abs(x[i] - y))
  }
  y[ind]
}

      

Then you can simply do:

younger$birthseason <- match_season(younger$HatchDate, season_enddates)

      

Looks cleaner and gives the desired result in the correct format Date

.

Benchmarking:

start <- as.Date("1990-07-01")
end <- as.Date("2017-06-30")

birthdate <- sample(seq(start, end, by = "1 day"), 1000)

season_enddates <- seq(as.Date("1990-12-21"),
                       as.Date("2017-6-21"),
                       by = "3 months")

library(rbenchmark)

benchmark(match_season(birthdate, season_enddates),
          columns = c("test","elapsed"))

      

gives a time of 7.62 seconds for 100 reps.

+1


source


findInterval

useful in such cases. Location of the nearest one season_enddates

for each df$birthdate

:

vec = sort(season_enddates)
int = findInterval(df$birthdate, vec, all.inside = TRUE)
int
#[1]  1  5  8 10  1

      

we compare the distance from each of the closest dates of the interval and select the minimum:

ans = vec[int]
i = abs(df$birthdate - vec[int]) > abs(df$birthdate - vec[int + 1])
ans[i] = vec[int[i] + 1]
ans
#[1] "2012-06-21" "2014-06-25" "2015-11-16" "2016-11-22" "2012-06-21"

      

0


source







All Articles