Cumulative weekly data in R

I'm sure this is straight forward, but I just can't get it to work. I have a data frame that represents daily totals. I just want to summarize the totals for the week, keeping zero if no week is presented. What's the best approach in R? In case it matters, I read data from CSV and convert it to date once in R.

Here is the structure of my p1 dataframe:

'data.frame':   407 obs. of  2 variables:
 $ date:Class 'Date'  num [1:407] 14335 14336 14337 14340 14341 ...
 $ amt : num  45 150 165 165 45 45 150 150 15 165 ...

      

and the first few ...

> head(p1)
        date amt
1 2009-04-01  45
2 2009-04-02 150
3 2009-04-03 165
4 2009-04-06 165
5 2009-04-07  45
6 2009-04-08  45

      

Thank you very much in advance.

One note: I saw one previous post but could not get it to work

+2


source to share


2 answers


Here is a solution that reads the data, aggregates it by week, and then fills in the missing weeks with zero in just 3 lines of code. read.zoo

reads it assuming a header and comma delimiter. It converts the first column to a class Date

and then converts the date to next Friday. The function nextfri

that does this conversion, taken from a vignette zoo-quickref

in the zoo package. (If you want the end of the week to be a different day of the week, just replace 5 with a different day number.) Commandread.zoo

also concatenates all points that have the same index (remember that we turned them into the last Friday of the week, so all points in the same week will have the same Friday as their index). The following command creates a zero-width zoo object that has weeks from the first to the last and concatenates that with the output of read, using the fill = 0

filled weeks to get that value.

Lines <- "date,amt
2009-04-01,45
2009-04-02,150
2009-04-03,165
2009-04-13,165
2009-04-14,45
2009-04-15,45"
library(zoo)
nextfri <- function(x) 7 * ceiling(as.numeric(x - 5 + 4)/7) + as.Date(5 - 4)
z <- read.zoo(textConnection(Lines), header = TRUE, sep = ",", 
    FUN = as.Date, FUN2 = nextfri, aggregate = sum)
merge(z, zoo(, seq(min(time(z)), max(time(z)), 7)), fill = 0)

      

We used the textConnection(Lines)

above to make it standalone so you can just copy it and paste it right into your session, but it textConnection(Lines)

will actually be replaced with your file name, eg. "myfile.csv"

...



For the input above, the output will be the following zoo object:

2009-04-03 2009-04-10 2009-04-17 
       360          0        255

      

There are three vignettes that come with the zoo package you might want to read.

+2


source


Solution with library lubridate

:

library(lubridate)
Lines <- "date,amt
2009-04-01,45
2009-04-02,150
2009-04-03,165
2009-04-13,165
2009-04-14,45
2009-04-15,45
2009-05-15,45"
df <- read.csv(textConnection(Lines))

      

If you don't need 0 for missed weeks, it's simple:



weeks <- week(df$date)
sums <- tapply(df$amt, weeks, sum)
# 14  15  16  20 
#360 210  45  45 

      

To put zeros for missing weeks:

span <- min(weeks):max(weeks)
out <- array(0, dim = length(span), dimnames = list(span))
out[dimnames(sums)[[1]]] <- sums
# 14  15  16  17  18  19  20 
#360 210  45   0   0   0  45 

      

+5


source







All Articles