Calculating the average for every second value in the data frame

Question

Calculating the average for every second value in the data frame

I would like to aggregate each of the two cell values by the mean and continue with the same process on the column of the dataframe. To be more precise, see the following piece of data:

    X         Y             Z
1   FRI 200101010000    -6.72
2   FRI 200101010030    -6.30
3   FRI 200101010100    -6.26
4   FRI 200101010130    -5.82
5   FRI 200101010200    -5.64
6   FRI 200101010230    -5.29
7   FRI 200101010300    -5.82
8   FRI 200101010330    -5.83
9   FRI 200101010400    -5.83
10  FRI 200101010430    -6.04
11  FRI 200101010500    -5.80
12  FRI 200101010530    -6.09

I would like to calculate the average of each Z on Y ending in 00 and 30, which means calculating the average of #row 1 + 2, #row 3 + 4, #row 5 + 6, etc ... which is what I expect here:

    X         Y             Z
1   FRI 200101010100    -6.51
2   FRI 200101010200    -6.04
3   FRI 200101010300    -5.47
...

Explanation: Y is the time: YYYYMMDDhhmm and I would like to average 30 minutes measurements per 1h measurements

+3

r aggregate dataframe mean

xmisx June 24. '15 at 9:37

source to share

4 answers

a base-R solution where I first split the vector into pieces and calculated the average of each, this of course assumes the order you specified is always the case. Finally, I'll combine them to give your result:

Z <- unlist(lapply(split(df$Z, ceiling(seq_along(df$Z) / 2)), mean))
new_df <- cbind(df[seq(1,nrow(df), 2), c("X", "Y")], Z)

output:

     X            Y      Z
1  FRI 200101010000 -6.510
3  FRI 200101010100 -6.040
5  FRI 200101010200 -5.465
7  FRI 200101010300 -5.825
9  FRI 200101010400 -5.935
11 FRI 200101010500 -5.945

+2

grrgrrbla June 24. '15 at 10:00

source to share

dplyr version

library(dplyr)

df$Y <- as.character(df$Y)

means <- df %>%
        group_by(hour = substr(Y, start = 1, stop=10)) %>% summarise(Z = mean(Z))

> means
Source: local data frame [6 x 2]

        hour      Z
1 2001010100 -6.510
2 2001010101 -6.040
3 2001010102 -5.465
4 2001010103 -5.825
5 2001010104 -5.935
6 2001010105 -5.945

Groups data by variable Y without the last two digits.

+1

JohannesNE June 24. '15 at 9:47

source to share

While this is not the case for the OP, in general for aggregating the second, if you have a column POSIXct

, you can:

library(lubridate)
library(tidyverse)
s <- seq(from=Sys.time(), length.out = 100, by=0.4)
df  = data.frame(time=s,v=rnorm(length(s)))
df %>% 
  group_by(time=floor_date(time, "1 second")) %>% 
  summarize(v=mean(v)) # you can put any other interval like 5 minute

0

ecerulm May 18 '17 at 19:53

source to share

David Arenburg · Accepted Answer · 2015-06-24T09:44:29+0000

Here's a possible solution data.table

library(data.table)
setDT(df)[, .(Y = Y[1L], Z = mean(Z)), by = .(X, indx = cumsum(substr(Y, 11, 12) == '00'))]
#      X indx            Y      Z
# 1: FRI    1 200101010000 -6.510
# 2: FRI    2 200101010100 -6.040
# 3: FRI    3 200101010200 -5.465
# 4: FRI    4 200101010300 -5.825
# 5: FRI    5 200101010400 -5.935
# 6: FRI    6 200101010500 -5.945

Or per @ akruns comment using aggregate

from base (although it will probably need some extra tweeking in the output)

aggregate(Z ~ X + indx, transform(df, indx = cumsum(substr(Y, 11, 12) == '00')), mean)

Calculating the average for every second value in the data frame

More articles: