R: How to calculate the average of every 10 rows of a variable

I have several datasets that were collected every 1 min, but I have to replace the data with an average of 10 minutes. So I have this R code.

for(k in 1:(length(temp[,1])/10)){
  temp2[k,1]<-temp[1,1]
  temp2[k,2]<-temp[k*10,2]
  temp2[k,3]<-mean(na.omit(as.numeric(temp[((k-1)*10+1):k*10,3])))
}

      

However, the efficiency of this code is too low. And one more question. Due to some missing data, the time variable is not always continuous. And I have to calculate data every 10 minutes (for example, from 2014-01-01 00:00 to 2014-01-01 00:10), no matter how much total during those 10 minutes. So the loop goes to

  tmp<-na.omit(temp[temp[,2]>(st+600*(k-1)) & temp[,2]<=(st+600*k),])
  temp2[k,1]<-tmp[1,1]
  temp2[k,2]<-st+600*k
  temp2[k,3]<-mean(na.omit(as.numeric(tmp[,3])))

      

which cannot be tolerated. And it cannot efficiently handle cases like "some months are missing". So how can I solve this in R and the efficiency is not great.

Initial data:

Time  Var1
2014-01-01 00:01  10
2014-01-01 00:02  12
2014-01-01 00:03  43

...
2014-01-01 00:10  52

      

desired result:

Time  Var1
2014-01-01 00:10  (mean of every 10 mins)
2014-01-01 00:20  (mean of every 10 mins)
...

      

+3


source to share


3 answers


Have a look at the package xts

and in particular the function period.apply

withendpoints

Assuming you can get your data as an xts object (called in this case xt.data

), then something like the following will work.



# example data
times <- seq(Sys.time()-50000,Sys.time(),by=60)
mydt <- data.frame(time = times[sample(seq_along(times),size=300)], test = runif(300)) 
xt.data <- as.xts(mydt[,2], order.by= mydt[['time']])

period.apply(xt.data, endpoints(xt.data,'minutes',10),mean)

      

+4


source


Take a look at ?cut.POSIXt

, ?seq.POSIXt

and round.POSIXt

. The functions cut

and seq

allow for setting breaks in the interval "10 min"

, but unfortunately the round function doesn't seem to have such a nice feature. You can multiply by 10, round to the nearest "min" and divide by 10, but I haven't tried all of that.



+2


source


If you remember that there is a POSIXlt format for time that makes it trivial to manipulate each component, then this is relatively easy. What I've done here is read in the data using POSIXct format (because you can't read from POSIXlt), convert to POSIXlt, and then just group by 10 seconds, convert back and do the aggregate. This should be pretty fast.

dat <- read.table(text = 'time, y
                          2014-01-01 00:01, 10
                          2014-01-01 00:02, 12
                          2014-01-01 00:22, 43', 
                          header = TRUE, sep = ',', colClasses = c('POSIXct', 'numeric'))
dat$time <- as.POSIXlt(dat$time)
dat$time[[2]] <- floor(dat$time[[2]] / 10) * 10
dat$time <- as.POSIXct(dat$time)
aggregate(y ~ time, data = dat, mean)

      

As an aside, you said you replace every 10 minutes, not the aggregate. In this case, the cumulative rows become:

y$time <- ave(y, time)

      

And if you want to keep the original time and all other data intact, but just replace it with 10 minutes, you can replace everything after read.table

with:

dat$time <- as.POSIXlt(dat$time)
g <- floor(dat$time[[2]] / 10) * 10
dat$y <- ave(y, g)

      

+1


source







All Articles