R: time series with repeating time index entries

Question

R: time series with repeating time index entries

I have n00b in R and n00b on stack overflow (just joined), so forgive me if I failed to use markup (which I don't know) or missed something in the readme.

If you don't mind, I'll cover my whole problem here, as perhaps you can be kind enough to shed some insight on how I would best do this!

Stage 1
Build separate time series objects for each TS1. Below is an example of the data. Basically, I am loading a csv file with several irregular time series in it (example TS1, TS2) below, so in a perfect world I would split them into separate irregular time series objects (like a zoo?), So TS1, TS2, .. this issue was discussed here ( R / zoo: handle non-standard index entries but not lose data? ), but I tried this approach repeatedly and failed.

 Date TS Data 
 21/05/2014 TS1 0.95  
 17/04/2014 TS1 1.02   
 27/03/2014 TS1 0.90   
 30/01/2014 TS1 0.80   
 12/12/2013 TS1 0.70  
 18/09/2013 TS1 0.67  
 01/11/2012 TS1 0.71  
 01/11/2012 TS1 0.70  
 21/05/2014 TS2 0.47  
 20/05/2014 TS2 0.51  
 16/05/2014 TS2 0.49  
 15/05/2014 TS2 0.55  
 10/05/2014 TS2 0.63  
 07/05/2014 TS2 0.77

as you can see, the problem is due to the duplicate date index 01/11/2012

for TS1, which causes read.zoo

my split data object not to be created.

Stage 2
What I would like to do is add all data from that date together on every irregular date. Since all time series are irregular and with different regularity, I would like to use the previous value for a TS

. For example. for 21/05/2014

this calculation in the example is simple, since TS1 and 2 have an entry, so the answer will be 0.47 + 0.95

. But for 20/05

only TS2

has an entry, so the value for TS1

to be used is the most recent since that date, that is, the value 17/04/2014

1.02

, so the calculation for 20/05/2014

must be0.51 + 1.02

... Maybe the simplest way to achieve this would be to convert each TS to a daily value so that the previous value is used until the new data point? but this is wasteful / unnecessary for step 3 below.

Stage 3
Having created this cumulative data sum of all TSs, I want to make a polynomial curve. I also want to distinguish this curve in order to find the rate of change to date given by this curve.

Any help would be greatly appreciated! I feel like banging my head against the wall repeatedly would be much more fun than doing anything else at this stage!

thank

Updated: I now now have the code following Grothendieck.

library(scales)  
library(zoo)  
library(ggplot2)  

f <- function (z) {  
zz <- read.zoo(z, header = TRUE, split = 2, format = "%d/%m/%Y", aggregate = mean);  
z.fill <- na.locf(zz);  
z.fill <- (z.fill >= 0.5) * z.fill;  
z.fill <- na.fill(z.fill,0);  
zfill.mat = matrix(z.fill, NROW(z.fill));  
z.sum <- rowSums(zfill.mat);  
zsum <- zoo(z.sum,time(z.fill));  
return(zsum);  
}  

DF <- read.csv(file.choose(), header = TRUE, as.is = TRUE);  
DF.S <- split(DF[-2], DF[[2]]);  
user <- DF[1,2];  
Ret <- lapply(DF.S,  f);

I remain a problem:
Ret contains a list of data frames. I can access it by typing Ret $ user, but since the user is changing, I need to make this dynamic. I tried to build a dynamic expression like:
x <- paste ("Ret $ '", user, "'", sep = "");
plot (x)

but couldn't appreciate it.

+3

r indexing unique

Carl 12 Sep 14 at 16:20

source to share

1 answer

G. Grothendieck · Accepted Answer · 2014-09-12T17:28:15+0000

read.zoo

has an argument aggregate=

that takes a function that is used to aggregate the values twice in the same series. Here we take mean

recurring days within a series, but you can use sum

any other function. (If the data came from a file, we would replace the argument text = Lines

with read.zoo

something like "myfile.dat"

.) Then we use na.locf

NA to fill, sum the lines, and use na.omit

to fill in any leading NS giving zsum

. We then compute a grid with a regular interval g

and a spline function splfun

by evaluating this function and its derivative on the grid, which, when converted to the zoo, gives zspl

and zder

. Finally, we will build them.

Lines <- "Date TS Data 
 21/05/2014 TS1 0.95  
 17/04/2014 TS1 1.02   
 27/03/2014 TS1 0.90   
 30/01/2014 TS1 0.80   
 12/12/2013 TS1 0.70  
 18/09/2013 TS1 0.67  
 01/11/2012 TS1 0.71  
 01/11/2012 TS1 0.70  
 21/05/2014 TS2 0.47  
 20/05/2014 TS2 0.51  
 16/05/2014 TS2 0.49  
 15/05/2014 TS2 0.55  
 10/05/2014 TS2 0.63  
 07/05/2014 TS2 0.77"

library(zoo)

z <- read.zoo(text = Lines, header = TRUE, split = 2, format = "%d/%m/%Y",
       aggregate = mean)
zsum <- na.omit(zoo(rowSums(na.locf(z)), time(z)))

g <- seq(start(zsum), end(zsum), "day")
splfun <- splinefun(time(zsum), coredata(zsum))
zspl <- zoo(splfun(g), g)
zder <- zoo(splfun(g, deriv = 1), g)

plot(merge(zspl, zder))

screenshot

R: time series with repeating time index entries

More articles: