R / zoo: handle non-standard index entries but not lose data?
I have a CSV file with data points (like financial ticks, experimental records, etc.) and my data has duplicate timestamps. Here's some code to demonstrate the problem:
library(zoo);library(xts) csv="2011-11-01,50 2011-11-02,49 2011-11-02,48 2011-11-03,47 2011-11-03,46 2011-11-03,45 2011-11-04,44 2011-11-04,43 2011-11-04,42 2011-11-04,41 " z1=read.zoo(textConnection(csv),sep=',') w1=to.weekly(z1) ep=endpoints(z1,"weeks",1) w1$Volume=period.apply(z1,ep,length) z2=read.zoo(textConnection(csv),sep=',',aggregate=T) w2=to.weekly(z2) ep=endpoints(z2,"weeks",1) w2$Volume=period.apply(z2,ep,length)
vignette ('zoo-faq') entry 1 tells me that aggregate = T gets rid of the zoo warning message. But then the results change:
> w1 z1.Open z1.High z1.Low z1.Close Volume 2011-11-04 50 50 41 41 10 > w2 z2.Open z2.High z2.Low z2.Close Volume 2011-11-04 50 50 42.5 42.5 4
Is there another way to get rid of the warning but still get the same results as w1? (Yes, I know about suppressWarnings (), which I used before, but I hate the idea.) (I was wondering how to pass a custom read.zoo aggregate function that will return OHLCV data for every day ... but couldn't even would work if it were possible.)
source to share
You need a function to overlay the timestamps in epsilon increments to make them different.
I also wrote one or two Rcpp based functions for this. At times it is most often POSIXct, which is indeed a float (after you do
), so just loop the timestamps, and for equality with the previous one add a small delta 1.0e-7, which is less than what POSIXct can represent. Reset the cumulative delta every time you have an actual gap.
Edit: try to function
in xts package:
sametime <- rep(Sys.time(), 3) xts(1:3, order.by=make.time.unique(sametime)) [,1] 2011-12-20 06:52:37.547299 1 2011-12-20 06:52:37.547300 2 2011-12-20 06:52:37.547301 3
Edit 2: Here's another example for indexed objects
R> samedate <- rep(Sys.Date(), 5) R> xts(1:5, order.by=make.time.unique(as.POSIXct(samedate))) [ ] 2011-12-19 18:00:00.000000 1 2011-12-19 18:00:00.000000 2 2011-12-19 18:00:00.000001 3 2011-12-19 18:00:00.000002 4 2011-12-19 18:00:00.000003 5 R> xts(1:5, order.by=as.Date(make.index.unique(as.POSIXct(samedate)))) [ ] 2011-12-20 1 2011-12-20 2 2011-12-20 3 2011-12-20 4 2011-12-20 5 R>
The first solution switches to POSIXct, which ends six hours before midnight, since GMT minus six hours is my local timezone. The second example uses double conversion and back to
---, which was then made unique.
source to share