R / zoo: handle non-standard index entries but not lose data?

Question

R / zoo: handle non-standard index entries but not lose data?

I have a CSV file with data points (like financial ticks, experimental records, etc.) and my data has duplicate timestamps. Here's some code to demonstrate the problem:

library(zoo);library(xts)

csv="2011-11-01,50
2011-11-02,49
2011-11-02,48
2011-11-03,47
2011-11-03,46
2011-11-03,45
2011-11-04,44
2011-11-04,43
2011-11-04,42
2011-11-04,41
"

z1=read.zoo(textConnection(csv),sep=',')
w1=to.weekly(z1)
ep=endpoints(z1,"weeks",1)
w1$Volume=period.apply(z1,ep,length)

z2=read.zoo(textConnection(csv),sep=',',aggregate=T)
w2=to.weekly(z2)
ep=endpoints(z2,"weeks",1)
w2$Volume=period.apply(z2,ep,length)

vignette ('zoo-faq') entry 1 tells me that aggregate = T gets rid of the zoo warning message. But then the results change:

> w1
           z1.Open z1.High z1.Low z1.Close Volume
2011-11-04      50      50     41       41     10
> w2
           z2.Open z2.High z2.Low z2.Close Volume
2011-11-04      50      50   42.5     42.5      4

Is there another way to get rid of the warning but still get the same results as w1? (Yes, I know about suppressWarnings (), which I used before, but I hate the idea.) (I was wondering how to pass a custom read.zoo aggregate function that will return OHLCV data for every day ... but couldn't even would work if it were possible.)

+1

r zoo

Darren cook Dec 20. 11 at 4:11

source to share

2 answers

You need a function to overlay the timestamps in epsilon increments to make them different.

I also wrote one or two Rcpp based functions for this. At times it is most often POSIXct, which is indeed a float (after you do as.numeric

), so just loop the timestamps, and for equality with the previous one add a small delta 1.0e-7, which is less than what POSIXct can represent. Reset the cumulative delta every time you have an actual gap.

Edit: try to function make.index.unique()

and make.time.unique()

in xts package:

R> sametime <- rep(Sys.time(), 3)
R> xts(1:3, order.by=make.time.unique(sametime))
                           [,1]
2011-12-20 06:52:37.547299    1
2011-12-20 06:52:37.547300    2
2011-12-20 06:52:37.547301    3
R>

Edit 2: Here's another example for indexed objects Date

:

R> samedate <- rep(Sys.Date(), 5)   # identical dates
R> xts(1:5, order.by=make.time.unique(as.POSIXct(samedate)))
                           [,1]
2011-12-19 18:00:00.000000    1
2011-12-19 18:00:00.000000    2
2011-12-19 18:00:00.000001    3
2011-12-19 18:00:00.000002    4
2011-12-19 18:00:00.000003    5
R> xts(1:5, order.by=as.Date(make.index.unique(as.POSIXct(samedate))))
           [,1]
2011-12-20    1
2011-12-20    2
2011-12-20    3
2011-12-20    4
2011-12-20    5
R>

The first solution switches to POSIXct, which ends six hours before midnight, since GMT minus six hours is my local timezone. The second example uses double conversion and back to Date

---, which was then made unique.

+3

Dirk Eddelbuettel Dec 20. 11 at 4:32

source to share

Henry · Accepted Answer · 2011-12-20T08:26:47+0000

As a simple variation on Dirk's suggestion, this should work

z0 = read.csv( textConnection(csv), sep=',', header=FALSE )
z1 = zoo( z0$V2, as.Date(z0$V1) + (1:nrow(z0))*10^-10 )

R / zoo: handle non-standard index entries but not lose data?

More articles: