How to lag values ​​by the number of days in a data table

Let's say I have a data table:

id    dt          val1   val2
1   '2014-01-01'  100     200 
2   '2014-01-01'  100     200
3   '2014-01-01'  100     200
4   '2014-01-01'  100     200
...
9   '2014-01-14'  1000   1100
10  '2014-01-14'  1000   1100

      

There are 100 rows - val1 and val2 for each of the ten ids for ten days.

I need to prepare a set of lagging values ​​by id for a variable number of days. For example, if I wanted two days, I would see the following:

id    dt          val1   val2  val1_lag  val2_lag
1   '2014-01-01'  100     200    NA         NA
2   '2014-01-01'  100     200    NA         NA
...
1   '2014-01-03'  300     400    100       200
2   '2014-01-03'  300     400    100       200
...
1   '2014-01-09'  500     600    300       400
...
9   '2014-01-14'  1000   1100    800       900

      

Note that January 9 is the previous value from January 4. There is no date exactly two days before, so it returns a value that is closest to January 7 (January 9 minus two days), no greater than January 7. It happens mechanically every day. For example, January 3 is used for values ​​from January 1 because they are closest to January 3 minus two days.

There is one complication. There is no guarantee that the deferred date will be accurately priced on weekends and holidays. I want the latest value, which is at least x days earlier.

Is there a way to do this using the power of data tables? You can create a test table using:

library(data.table)
a <- rep(c(1:10), 10)
b <- rep(c(seq(as.Date("2014-01-01"), by = 1, len = 4), seq(as.Date("2014-01-09"), by = 1, len = 6)), 10)
c <- rep(seq(100, by = 100, len = 10), 10)
d <- rep(seq(200, by = 100, len = 10), 10)
e <- data.frame(a, b, c, d)
colnames(e) <- c("id", "dt", "val1", "val2")
setDT(e)
setkeyv(e, c("id", "dt"))

      

My instinct says to use something like this formula to generate lagging values, I'm just trying to do it automatically for multiple columns:

setDT(e)[, ??][by = id]

EDIT: I modified my code to reflect the potential gap in dates caused by weekends, holidays, etc.

+3


source to share


1 answer


Here's a solution using another minimal example that might be easier to understand (and covers more cases).

library(data.table)
lag <- 2;
dt <- data.table(id = c(1,1,1,1,2,2), 
                 dt = as.Date(c("2014-1-1", "2014-1-4", "2014-1-6", "2014-1-7", "2014-1-1", "2014-1-10")),
                 val1 = sample(10, 6),
                 val2 = sample(10, 6))
dt
   id         dt val1 val2
1:  1 2014-01-01    8    7
2:  1 2014-01-04   10    5
3:  1 2014-01-06    3    3
4:  1 2014-01-07    7    2
5:  2 2014-01-01    2    1
6:  2 2014-01-10    5    6

dt1 <- copy(dt)
dt1[, laggedDate:=dt]
setkey(dt1, "id", "laggedDate")

dt2 <- copy(dt)
dt2[, laggedDate:=dt-2]
setkey(dt2, "id", "laggedDate")

setnames(dt1, c("dt","val1", "val2"), c("dt.lagged", "val1.lagged", "val2.lagged"))
dt1[dt2, roll=TRUE]
   id laggedDate  dt.lagged val1.lagged val2.lagged         dt val1 val2
1:  1 2013-12-30       <NA>          NA          NA 2014-01-01    8    7
2:  1 2014-01-02 2014-01-01           8           7 2014-01-04   10    5
3:  1 2014-01-04 2014-01-04          10           5 2014-01-06    3    3
4:  1 2014-01-05 2014-01-04          10           5 2014-01-07    7    2
5:  2 2013-12-30       <NA>          NA          NA 2014-01-01    2    1
6:  2 2014-01-08 2014-01-01           2           1 2014-01-10    5    6

      



The key point here is the use of the circular data join function. If you're uncomfortable check out this blog post .

+2


source







All Articles