Cumulative weekly data in R
I'm sure this is straight forward, but I just can't get it to work. I have a data frame that represents daily totals. I just want to summarize the totals for the week, keeping zero if no week is presented. What's the best approach in R? In case it matters, I read data from CSV and convert it to date once in R.
Here is the structure of my p1 dataframe:
'data.frame': 407 obs. of 2 variables:
$ date:Class 'Date' num [1:407] 14335 14336 14337 14340 14341 ...
$ amt : num 45 150 165 165 45 45 150 150 15 165 ...
and the first few ...
> head(p1)
date amt
1 2009-04-01 45
2 2009-04-02 150
3 2009-04-03 165
4 2009-04-06 165
5 2009-04-07 45
6 2009-04-08 45
Thank you very much in advance.
One note: I saw one previous post but could not get it to work
source to share
Here is a solution that reads the data, aggregates it by week, and then fills in the missing weeks with zero in just 3 lines of code. read.zoo
reads it assuming a header and comma delimiter. It converts the first column to a class Date
and then converts the date to next Friday. The function nextfri
that does this conversion, taken from a vignette zoo-quickref
in the zoo package. (If you want the end of the week to be a different day of the week, just replace 5 with a different day number.) Commandread.zoo
also concatenates all points that have the same index (remember that we turned them into the last Friday of the week, so all points in the same week will have the same Friday as their index). The following command creates a zero-width zoo object that has weeks from the first to the last and concatenates that with the output of read, using the fill = 0
filled weeks to get that value.
Lines <- "date,amt
2009-04-01,45
2009-04-02,150
2009-04-03,165
2009-04-13,165
2009-04-14,45
2009-04-15,45"
library(zoo)
nextfri <- function(x) 7 * ceiling(as.numeric(x - 5 + 4)/7) + as.Date(5 - 4)
z <- read.zoo(textConnection(Lines), header = TRUE, sep = ",",
FUN = as.Date, FUN2 = nextfri, aggregate = sum)
merge(z, zoo(, seq(min(time(z)), max(time(z)), 7)), fill = 0)
We used the textConnection(Lines)
above to make it standalone so you can just copy it and paste it right into your session, but it textConnection(Lines)
will actually be replaced with your file name, eg. "myfile.csv"
...
For the input above, the output will be the following zoo object:
2009-04-03 2009-04-10 2009-04-17
360 0 255
There are three vignettes that come with the zoo package you might want to read.
source to share
Solution with library lubridate
:
library(lubridate)
Lines <- "date,amt
2009-04-01,45
2009-04-02,150
2009-04-03,165
2009-04-13,165
2009-04-14,45
2009-04-15,45
2009-05-15,45"
df <- read.csv(textConnection(Lines))
If you don't need 0 for missed weeks, it's simple:
weeks <- week(df$date)
sums <- tapply(df$amt, weeks, sum)
# 14 15 16 20
#360 210 45 45
To put zeros for missing weeks:
span <- min(weeks):max(weeks)
out <- array(0, dim = length(span), dimnames = list(span))
out[dimnames(sums)[[1]]] <- sums
# 14 15 16 17 18 19 20
#360 210 45 0 0 0 45
source to share