Calculate statistics based on dynamic window using dplyr

I am trying to use dplyr in R to calculate rental statistics (average, sd, etc.) based on a dynamic window based on dates and for specific models. For example, as part of groupings of items, I would like to calculate the average of all data 10 days before. The data dates are not sequential or complete, so I cannot use a fixed window.

One way to do this is to use rollapply referring to the window width as shown below. However, I am having trouble calculating the dynamic width. I would prefer a method that omits the intermediate step of calculating the window and just calculates based on the date. Here's an example of a toy.

I've used for loops to do this, but they are very slow.

    library(dplyr)
library(zoo)

date_lookback <- 10 #days to look back for rolling calcs

df <- data.frame(label = c(rep("a",5),rep("b",5)),
                 date = as.Date(c("2017-01-02","2017-01-20",
                                  "2017-01-21","2017-01-30","2017-01-31","2017-01-05",
                                  "2017-01-08","2017-01-09","2017-01-10","2017-01-11")),
                data = c(790,493,718,483,825,186,599,408,108,666),stringsAsFactors = FALSE) %>%
  mutate(.,
         cut_date = date - date_lookback, #calcs based on sample since this date
         dyn_win = c(1,1,2,3,3,1,2,3,4,5), ##!! need to calculate this vector??
         roll_mean = rollapply(data, align = "right", width = dyn_win, mean),
         roll_sd = rollapply(data, align = "right", width = dyn_win, sd))

      

These are the roll_mean and roll_sd results I'm looking for:

> df
   label       date data   cut_date dyn_win roll_mean  roll_sd
1      a 2017-01-02  790 2016-12-23       1  790.0000       NA
2      a 2017-01-20  493 2017-01-10       1  493.0000       NA
3      a 2017-01-21  718 2017-01-11       2  605.5000 159.0990
4      a 2017-01-30  483 2017-01-20       3  564.6667 132.8847
5      a 2017-01-31  825 2017-01-21       3  675.3333 174.9467
6      b 2017-01-05  186 2016-12-26       1  186.0000       NA
7      b 2017-01-08  599 2016-12-29       2  392.5000 292.0351
8      b 2017-01-09  408 2016-12-30       3  397.6667 206.6938
9      b 2017-01-10  108 2016-12-31       4  325.2500 222.3921
10     b 2017-01-11  666 2017-01-01       5  393.4000 245.5928

      

Thanks in advance.

+3


source to share


1 answer


You can try to explicitly reference your dataset inside the dplyr call:



date_lookback <- 10 #days to look back for rolling calcs

df <- data.frame(label = c(rep("a",5),rep("b",5)),
                 date = as.Date(c("2017-01-02","2017-01-20",
                                  "2017-01-21","2017-01-30","2017-01-31","2017-01-05",
                                  "2017-01-08","2017-01-09","2017-01-10","2017-01-11")),
                 data = c(790,493,718,483,825,186,599,408,108,666),stringsAsFactors = FALSE)

df %>%
  group_by(date,label) %>%
  mutate(.,
         roll_mean = mean(ifelse(df$date >= date-date_lookback & df$date <= date & df$label == label,
                                 df$data,NA),na.rm=TRUE),
         roll_sd = sd(ifelse(df$date >= date-date_lookback & df$date <= date & df$label == label,
                             df$data,NA),na.rm=TRUE))

      

0


source







All Articles