R: time series with repeating time index entries
I have n00b in R and n00b on stack overflow (just joined), so forgive me if I failed to use markup (which I don't know) or missed something in the readme.
If you don't mind, I'll cover my whole problem here, as perhaps you can be kind enough to shed some insight on how I would best do this!
Stage 1
Build separate time series objects for each TS1. Below is an example of the data. Basically, I am loading a csv file with several irregular time series in it (example TS1, TS2) below, so in a perfect world I would split them into separate irregular time series objects (like a zoo?), So TS1, TS2, .. this issue was discussed here ( R / zoo: handle non-standard index entries but not lose data? ), but I tried this approach repeatedly and failed.
Date TS Data
21/05/2014 TS1 0.95
17/04/2014 TS1 1.02
27/03/2014 TS1 0.90
30/01/2014 TS1 0.80
12/12/2013 TS1 0.70
18/09/2013 TS1 0.67
01/11/2012 TS1 0.71
01/11/2012 TS1 0.70
21/05/2014 TS2 0.47
20/05/2014 TS2 0.51
16/05/2014 TS2 0.49
15/05/2014 TS2 0.55
10/05/2014 TS2 0.63
07/05/2014 TS2 0.77
as you can see, the problem is due to the duplicate date index 01/11/2012
for TS1, which causes read.zoo
my split data object not to be created.
Stage 2
What I would like to do is add all data from that date together on every irregular date. Since all time series are irregular and with different regularity, I would like to use the previous value for a TS
. For example. for 21/05/2014
this calculation in the example is simple, since TS1 and 2 have an entry, so the answer will be 0.47 + 0.95
. But for 20/05
only TS2
has an entry, so the value for TS1
to be used is the most recent since that date, that is, the value 17/04/2014
1.02
, so the calculation for 20/05/2014
must be0.51 + 1.02
... Maybe the simplest way to achieve this would be to convert each TS to a daily value so that the previous value is used until the new data point? but this is wasteful / unnecessary for step 3 below.
Stage 3
Having created this cumulative data sum of all TSs, I want to make a polynomial curve. I also want to distinguish this curve in order to find the rate of change to date given by this curve.
Any help would be greatly appreciated! I feel like banging my head against the wall repeatedly would be much more fun than doing anything else at this stage!
thank
Updated: I now now have the code following Grothendieck.
library(scales)
library(zoo)
library(ggplot2)
f <- function (z) {
zz <- read.zoo(z, header = TRUE, split = 2, format = "%d/%m/%Y", aggregate = mean);
z.fill <- na.locf(zz);
z.fill <- (z.fill >= 0.5) * z.fill;
z.fill <- na.fill(z.fill,0);
zfill.mat = matrix(z.fill, NROW(z.fill));
z.sum <- rowSums(zfill.mat);
zsum <- zoo(z.sum,time(z.fill));
return(zsum);
}
DF <- read.csv(file.choose(), header = TRUE, as.is = TRUE);
DF.S <- split(DF[-2], DF[[2]]);
user <- DF[1,2];
Ret <- lapply(DF.S, f);
I remain a problem:
Ret contains a list of data frames. I can access it by typing Ret $ user, but since the user is changing, I need to make this dynamic. I tried to build a dynamic expression like:
x <- paste ("Ret $ '", user, "'", sep = "");
plot (x)
but couldn't appreciate it.
source to share
read.zoo
has an argument aggregate=
that takes a function that is used to aggregate the values ββtwice in the same series. Here we take mean
recurring days within a series, but you can use sum
any other function. (If the data came from a file, we would replace the argument text = Lines
with read.zoo
something like "myfile.dat"
.) Then we use na.locf
NA to fill, sum the lines, and use na.omit
to fill in any leading NS giving zsum
. We then compute a grid with a regular interval g
and a spline function splfun
by evaluating this function and its derivative on the grid, which, when converted to the zoo, gives zspl
and zder
. Finally, we will build them.
Lines <- "Date TS Data
21/05/2014 TS1 0.95
17/04/2014 TS1 1.02
27/03/2014 TS1 0.90
30/01/2014 TS1 0.80
12/12/2013 TS1 0.70
18/09/2013 TS1 0.67
01/11/2012 TS1 0.71
01/11/2012 TS1 0.70
21/05/2014 TS2 0.47
20/05/2014 TS2 0.51
16/05/2014 TS2 0.49
15/05/2014 TS2 0.55
10/05/2014 TS2 0.63
07/05/2014 TS2 0.77"
library(zoo)
z <- read.zoo(text = Lines, header = TRUE, split = 2, format = "%d/%m/%Y",
aggregate = mean)
zsum <- na.omit(zoo(rowSums(na.locf(z)), time(z)))
g <- seq(start(zsum), end(zsum), "day")
splfun <- splinefun(time(zsum), coredata(zsum))
zspl <- zoo(splfun(g), g)
zder <- zoo(splfun(g, deriv = 1), g)
plot(merge(zspl, zder))
source to share