Aggregation based on split groups and previous time periods in R
I have a list containing Dates ("anchor_dates") and a dataframe containing results for Group and Test Date ("groups").
anchor_dates <- as.Date(c("2015-07-20","2015-07-21","2015-07-22"))
set.seed(3)
groups <- data.frame(Test.Date = as.Date(c(rep("2015-07-18", 3), rep("2015-07-19", 3), rep("2015-07-20", 3), rep("2015-07-21", 3))),
Group = rep(c("AAA","BBB","CCC"), 4), Var1 = round(runif(12,0,10), ), Var2 = round(runif(12,0,7)))
> head(groups)
Test.Date Group Var1 Var2
1 2015-07-18 AAA 2 4
2 2015-07-18 BBB 8 4
3 2015-07-18 CCC 4 6
4 2015-07-19 AAA 3 6
5 2015-07-19 BBB 6 1
6 2015-07-19 CCC 6 5
I need to use the Dates in the "anchor_dates" list as anchor points in the sets of "groups" and aggregate the variables by the Group from the two previous test dates before the anchor date. Each test date for a given group may not always have a result, so I cannot use subset () subtracting the anchor date by 1 and 2. I need to be able to pull the last two test cases for each group before the anchor date, regardless of whether how far they are and are not consistent.
The following makes me close, however, when I try
unsplit(temp, groups$Group)
after aggregation, the return is a flattened set that has something wrong with repeating the same Var sums and does not allow me to use Map () on the set, then adds the anchor date from the "anchor_dates" list.
f <- lapply(anchor_dates, function(x) {
lapply(split(groups, groups$Group), function(y) {
temp <- tail(y[order(y$Date == x), ], 2)
temp <- aggregate(cbind(Var1, Var2) ~ Group, data = temp, FUN = sum)
})
})
[[1]]
[[1]]$AAA
Group Var1 Var2
1 AAA 7 6
[[1]]$BBB
Group Var1 Var2
1 BBB 8 3
[[1]]$CCC
Group Var1 Var2
1 CCC 11 3
..............
End result should be returned like below (or comparable solution)
[[1]]
Group Var1 Var2
1 AAA 5 10
2 BBB 14 5
3 CCC 10 11
[[2]]
Group Var1 Var2
1 AAA 4 12
2 BBB 9 3
3 CCC 12 7
[[3]]
Group Var1 Var2
1 AAA 7 6
2 BBB 8 3
3 CCC 11 3
Which lets me finish with the next
f1 <- Map(cbind, f, anchor_dates)
do.call(rbind, f1)
Group Var1 Var2 Anchor.Date
1 AAA 5 10 2015-07-20
2 BBB 14 5 2015-07-20
3 CCC 10 11 2015-07-20
4 AAA 4 12 2015-07-21
5 BBB 9 3 2015-07-21
6 CCC 12 7 2015-07-21
7 AAA 7 6 2015-07-22
8 BBB 8 3 2015-07-22
9 CCC 11 3 2015-07-22
source to share
I did it using a function with another function inside it. The outer function is suitable for calling using by()
a subset of data frames, while the inner function is suitable for viewing multiple anchor dates.
func.get_agg_values <- function(df.groupdata,list_of_anchor_dates) {
df.returndata <- lapply(X = list_of_anchor_dates,
active.group.df = df.groupdata,
FUN = function(anchor.date,active.group.df) {
# Get order of the data frame in a proper order
active.group.df <- active.group.df[order(active.group.df$Test.Date,decreasing = TRUE),]
# Next, we subset active.group.df to those rows that are before the anchor date
# Since it was ordered, we can just take 1 and 2 as the last two dates before the anchor date
active.group.df <- active.group.df[as.numeric(active.group.df$Test.Date - anchor.date) < 0,][1:2,]
# Finally, get the sums and return a data frame
returned.row.df <- data.frame(Group = unique(active.group.df$Group),
Var1 = sum(active.group.df$Var1),
Var2 = sum(active.group.df$Var2),
Anchor.Date = anchor.date)
return(returned.row.df)
})
return(do.call(what = rbind.data.frame,
args = df.returndata))
}
f1 <- do.call(what = rbind.data.frame,
args = by(data = groups,
INDICES = groups$Group,
FUN = func.get_agg_values,
list_of_anchor_dates = anchor_dates))
> f1
Group Var1 Var2 Anchor.Date
AAA.1 AAA 5 10 2015-07-20
AAA.2 AAA 4 12 2015-07-21
AAA.3 AAA 7 6 2015-07-22
BBB.1 BBB 14 5 2015-07-20
BBB.2 BBB 9 3 2015-07-21
BBB.3 BBB 8 3 2015-07-22
CCC.1 CCC 10 11 2015-07-20
CCC.2 CCC 12 7 2015-07-21
CCC.3 CCC 11 3 2015-07-22
source to share
`rownames<-`(do.call(rbind,by(groups,groups$Group,function(g)
do.call(rbind,lapply(anchor_dates,function(anc) {
befores <- which(g$Test.Date<anc);
twobefore <- befores[order(anc-g$Test.Date[befores])[1:2]];
cbind(aggregate(.~Group,g[twobefore,names(g)!='Test.Date'],sum),Anchor.Date=anc);
}))
)),NULL);
## Group Var1 Var2 Anchor.Date
## 1 AAA 5 10 2015-07-20
## 2 AAA 4 12 2015-07-21
## 3 AAA 7 6 2015-07-22
## 4 BBB 14 5 2015-07-20
## 5 BBB 9 3 2015-07-21
## 6 BBB 8 3 2015-07-22
## 7 CCC 10 11 2015-07-20
## 8 CCC 12 7 2015-07-21
## 9 CCC 11 3 2015-07-22
source to share