Apply moving average to database by index

Question

Apply moving average to database by index

I would like to calculate a moving average of data in one dataframe with multiple ids. See my example dataset below.

date <- as.Date(c("2015-02-01", "2015-02-02", "2015-02-03", "2015-02-04", 
          "2015-02-05", "2015-02-06", "2015-02-07", "2015-02-08",  
          "2015-02-09", "2015-02-10", "2015-02-01", "2015-02-02", 
          "2015-02-03", "2015-02-04", "2015-02-05", "2015-02-06", 
          "2015-02-07", "2015-02-08", "2015-02-09", "2015-02-10"))
index <- c("a","a","a","a","a","a","a","a","a","a",
           "b","b","b","b","b","b","b","b","b","b")
x <- runif(20,1,100)
y <- runif(20,50,150)
z <- runif(20,100,200)

df <- data.frame(date, index, x, y, z)

I would like to calculate the average for x, y and z through a and then through b.

I have tried the following, but I am getting an error.

test <- tapply(df, df$index, FUN = rollmean(df, 5, fill=NA))

Mistake:

Error in xu[k:n] - xu[c(1, seq_len(n - k))] : 
  non-numeric argument to binary operator

There seems to be a problem with the index being a symbol, but I need it to calculate the means ...

+3

r statistics zoo

phaser 03 Aug 17 at 22:34

source to share

2 answers

1) ave Try it ave

, not tapply

and make sure it only applies on the columns of interest i.e. columns 3, 4, 5.

roll <- function(x) rollmean(x, 5, fill = NA)
cbind(df[1:2], lapply(df[3:5], function(x) ave(x, df$index, FUN = roll)))

giving:

         date index        x         y        z
1  2015-02-01     a       NA        NA       NA
2  2015-02-02     a       NA        NA       NA
3  2015-02-03     a 66.50522 127.45650 129.8472
4  2015-02-04     a 61.71320 123.83633 129.7673
5  2015-02-05     a 56.56125 120.86158 126.1371
6  2015-02-06     a 66.13340 119.93428 127.1819
7  2015-02-07     a 59.56807 105.83208 125.1244
8  2015-02-08     a 49.98779  95.66024 139.2321
9  2015-02-09     a       NA        NA       NA
10 2015-02-10     a       NA        NA       NA
11 2015-02-01     b       NA        NA       NA
12 2015-02-02     b       NA        NA       NA
13 2015-02-03     b 55.71327 117.52219 139.3961
14 2015-02-04     b 54.58450 107.81763 142.6101
15 2015-02-05     b 50.48102 104.94084 136.3167
16 2015-02-06     b 37.89790  95.45489 135.4044
17 2015-02-07     b 33.05259  85.90916 150.8673
18 2015-02-08     b 49.91385  90.04940 147.1376
19 2015-02-09     b       NA        NA       NA
20 2015-02-10     b       NA        NA       NA

2) Another way is to use by

. roll2

processes one group, by

applies it to each group producing the list by

, and do.call("rbind", ...)

puts it together.

roll2 <- function(x) cbind(x[1:2], rollmean(x[3:5], 5, fill = NA))
do.call("rbind", by(df, df$index, roll2))

giving:

           date index        x         y        z
a.1  2015-02-01     a       NA        NA       NA
a.2  2015-02-02     a       NA        NA       NA
a.3  2015-02-03     a 66.50522 127.45650 129.8472
a.4  2015-02-04     a 61.71320 123.83633 129.7673
a.5  2015-02-05     a 56.56125 120.86158 126.1371
a.6  2015-02-06     a 66.13340 119.93428 127.1819
a.7  2015-02-07     a 59.56807 105.83208 125.1244
a.8  2015-02-08     a 49.98779  95.66024 139.2321
a.9  2015-02-09     a       NA        NA       NA
a.10 2015-02-10     a       NA        NA       NA
b.11 2015-02-01     b       NA        NA       NA
b.12 2015-02-02     b       NA        NA       NA
b.13 2015-02-03     b 55.71327 117.52219 139.3961
b.14 2015-02-04     b 54.58450 107.81763 142.6101
b.15 2015-02-05     b 50.48102 104.94084 136.3167
b.16 2015-02-06     b 37.89790  95.45489 135.4044
b.17 2015-02-07     b 33.05259  85.90916 150.8673
b.18 2015-02-08     b 49.91385  90.04940 147.1376
b.19 2015-02-09     b       NA        NA       NA
b.20 2015-02-10     b       NA        NA       NA

3) Wide shape Another approach is to convert df

from long shape to wide shape, in which case it will make it simple rollmean

.

rollmean(read.zoo(df, split = 2), 5, fill = NA)

giving:

                x.a       y.a      z.a      x.b       y.b      z.b
2015-02-01       NA        NA       NA       NA        NA       NA
2015-02-02       NA        NA       NA       NA        NA       NA
2015-02-03 66.50522 127.45650 129.8472 55.71327 117.52219 139.3961
2015-02-04 61.71320 123.83633 129.7673 54.58450 107.81763 142.6101
2015-02-05 56.56125 120.86158 126.1371 50.48102 104.94084 136.3167
2015-02-06 66.13340 119.93428 127.1819 37.89790  95.45489 135.4044
2015-02-07 59.56807 105.83208 125.1244 33.05259  85.90916 150.8673
2015-02-08 49.98779  95.66024 139.2321 49.91385  90.04940 147.1376
2015-02-09       NA        NA       NA       NA        NA       NA
2015-02-10       NA        NA       NA       NA        NA       NA

This works because the dates are the same for both groups. If the dates were different, then it could have entered NA instead of rollmean

being able to process them. In this case use

rollapply(read.zoo(df, split = 2), 5, mean, fill = NA)

Note. ... Since the input uses random numbers in its definition to make it reproducible, we must issue first set.seed

. We used this:

set.seed(123)
date <- as.Date(c("2015-02-01", "2015-02-02", "2015-02-03", "2015-02-04", 
          "2015-02-05", "2015-02-06", "2015-02-07", "2015-02-08",  
          "2015-02-09", "2015-02-10", "2015-02-01", "2015-02-02", 
          "2015-02-03", "2015-02-04", "2015-02-05", "2015-02-06", 
          "2015-02-07", "2015-02-08", "2015-02-09", "2015-02-10"))
index <- c("a","a","a","a","a","a","a","a","a","a",
           "b","b","b","b","b","b","b","b","b","b")
x <- runif(20,1,100)
y <- runif(20,50,150)
z <- runif(20,100,200)

+3

G. Grothendieck 04 Aug 17 at 0:03

source to share

Dave gruenewald · Accepted Answer · 2017-08-03T22:50:44+0000

This should do the trick with a library dplyr

and zoo

:

library(dplyr)
library(zoo)

df %>% 
  group_by(index) %>% 
  mutate(x_mean = rollmean(x, 5, fill = NA),
         y_mean = rollmean(y, 5, fill = NA),
         z_mean = rollmean(z, 5, fill = NA))

You could probably remove this more by using a mutate_each

shape or another mutate

.

You can also change the arguments as rollmean

per your needs, for example align = "right"

orna.pad = TRUE

Apply moving average to database by index

More articles: