Aggregation based on split groups and previous time periods in R

I have a list containing Dates ("anchor_dates") and a dataframe containing results for Group and Test Date ("groups").

anchor_dates <- as.Date(c("2015-07-20","2015-07-21","2015-07-22"))
set.seed(3)
groups <- data.frame(Test.Date = as.Date(c(rep("2015-07-18", 3), rep("2015-07-19", 3), rep("2015-07-20", 3), rep("2015-07-21", 3))), 
              Group = rep(c("AAA","BBB","CCC"), 4), Var1 = round(runif(12,0,10), ), Var2 = round(runif(12,0,7)))

> head(groups)
    Test.Date Group Var1 Var2
1  2015-07-18   AAA    2    4
2  2015-07-18   BBB    8    4
3  2015-07-18   CCC    4    6
4  2015-07-19   AAA    3    6
5  2015-07-19   BBB    6    1
6  2015-07-19   CCC    6    5

      

I need to use the Dates in the "anchor_dates" list as anchor points in the sets of "groups" and aggregate the variables by the Group from the two previous test dates before the anchor date. Each test date for a given group may not always have a result, so I cannot use subset () subtracting the anchor date by 1 and 2. I need to be able to pull the last two test cases for each group before the anchor date, regardless of whether how far they are and are not consistent.

The following makes me close, however, when I try

unsplit(temp, groups$Group)

      

after aggregation, the return is a flattened set that has something wrong with repeating the same Var sums and does not allow me to use Map () on the set, then adds the anchor date from the "anchor_dates" list.

f <- lapply(anchor_dates, function(x) {
    lapply(split(groups, groups$Group), function(y) {
        temp <- tail(y[order(y$Date == x), ], 2)
        temp <- aggregate(cbind(Var1, Var2) ~ Group, data = temp, FUN = sum)
     })
})

[[1]]
[[1]]$AAA
  Group Var1 Var2
1   AAA    7    6

[[1]]$BBB
  Group Var1 Var2
1   BBB    8    3

[[1]]$CCC
  Group Var1 Var2
1   CCC   11    3
..............

      

End result should be returned like below (or comparable solution)

[[1]]
  Group Var1 Var2
1   AAA    5   10
2   BBB   14    5
3   CCC   10   11

[[2]]
  Group Var1 Var2
1   AAA    4   12
2   BBB    9    3
3   CCC   12    7

[[3]]
  Group Var1 Var2
1   AAA    7    6
2   BBB    8    3
3   CCC   11    3

      

Which lets me finish with the next

f1 <- Map(cbind, f, anchor_dates) 
do.call(rbind, f1)

  Group Var1 Var2 Anchor.Date
1   AAA    5   10  2015-07-20
2   BBB   14    5  2015-07-20
3   CCC   10   11  2015-07-20
4   AAA    4   12  2015-07-21
5   BBB    9    3  2015-07-21
6   CCC   12    7  2015-07-21
7   AAA    7    6  2015-07-22
8   BBB    8    3  2015-07-22
9   CCC   11    3  2015-07-22

      

+3


source to share


2 answers


I did it using a function with another function inside it. The outer function is suitable for calling using by()

a subset of data frames, while the inner function is suitable for viewing multiple anchor dates.



func.get_agg_values <- function(df.groupdata,list_of_anchor_dates) {

    df.returndata <- lapply(X = list_of_anchor_dates,
                            active.group.df = df.groupdata,
                            FUN = function(anchor.date,active.group.df) {

                                # Get order of the data frame in a proper order
                                active.group.df <- active.group.df[order(active.group.df$Test.Date,decreasing = TRUE),]

                                # Next, we subset active.group.df to those rows that are before the anchor date
                                # Since it was ordered, we can just take 1 and 2 as the last two dates before the anchor date
                                active.group.df <- active.group.df[as.numeric(active.group.df$Test.Date - anchor.date) < 0,][1:2,]

                                # Finally, get the sums and return a data frame
                                returned.row.df <- data.frame(Group = unique(active.group.df$Group),
                                                          Var1 = sum(active.group.df$Var1),
                                                          Var2 = sum(active.group.df$Var2),
                                                          Anchor.Date = anchor.date)
                                return(returned.row.df)
                            })
    return(do.call(what = rbind.data.frame,
                   args = df.returndata))
}
f1 <- do.call(what = rbind.data.frame,
              args = by(data = groups,
                        INDICES = groups$Group,
                        FUN = func.get_agg_values,
                        list_of_anchor_dates = anchor_dates))

> f1
      Group Var1 Var2 Anchor.Date
AAA.1   AAA    5   10  2015-07-20
AAA.2   AAA    4   12  2015-07-21
AAA.3   AAA    7    6  2015-07-22
BBB.1   BBB   14    5  2015-07-20
BBB.2   BBB    9    3  2015-07-21
BBB.3   BBB    8    3  2015-07-22
CCC.1   CCC   10   11  2015-07-20
CCC.2   CCC   12    7  2015-07-21
CCC.3   CCC   11    3  2015-07-22

      

+2


source


`rownames<-`(do.call(rbind,by(groups,groups$Group,function(g)
    do.call(rbind,lapply(anchor_dates,function(anc) {
        befores <- which(g$Test.Date<anc);
        twobefore <- befores[order(anc-g$Test.Date[befores])[1:2]];
        cbind(aggregate(.~Group,g[twobefore,names(g)!='Test.Date'],sum),Anchor.Date=anc);
    }))
)),NULL);
##   Group Var1 Var2 Anchor.Date
## 1   AAA    5   10  2015-07-20
## 2   AAA    4   12  2015-07-21
## 3   AAA    7    6  2015-07-22
## 4   BBB   14    5  2015-07-20
## 5   BBB    9    3  2015-07-21
## 6   BBB    8    3  2015-07-22
## 7   CCC   10   11  2015-07-20
## 8   CCC   12    7  2015-07-21
## 9   CCC   11    3  2015-07-22

      



+1


source







All Articles