Is the more efficient pitching connected backward rather than forward?
data.table
implements asof
(also known as rolling
or LOCF
). I found this related question:
Filling in missing (gaps) in the data table, for each category - back and forth
but this question contains NA data. In my case, I am following the advice to save data irregularly and join it with roll=TRUE
. What I would like to do, and not the last observation carried forward, is the following observation, which should be carried back as far as possible.
This is what I tried, using first time:=-time
to try and trick it. Can I do it better? Can I make it faster?
llorJoin <- function(A,B){
B <- copy(B);
keys <- key(A);
if( !identical(key(A), key(B)) | is.null(keys) ){
stop("llorJoin::ERROR; A and B should have the same non-empty keys");
}
lastKey <- tail(keys,1L);
myStr <- parse(text=paste0(lastKey,":=-as.numeric(",lastKey,")"));
A <- A[,eval(myStr)]; setkeyv(A,keys);
B <- B[,eval(myStr)]; setkeyv(B,keys);
origin <- "1970-01-01 00:00.00 UTC";
A <- B[A,roll=T];
myStr2 <- parse(text=paste0(lastKey,":=as.POSIXct(-",lastKey,",origin=origin)"));
A <- A[,eval(myStr2)]; setkeyv(A,keys);
return(A);
}
library(data.table)
A <- data.table(time=as.POSIXct(c("10:01:01","10:01:02","10:01:04","10:01:05","10:01:02","10:01:01","10:01:01"),format="%H:%M:%S"),
b=c("a","a","a","a","b","c","c"),
d=c(1,1.9,2,1.8,5,4.1,4.2));
B <- data.table(time=as.POSIXct(c("10:01:01","10:01:03","10:01:00","10:01:01"),format="%H:%M:%S"),b=c("a","a","c","d"), e=c(1L,2L,3L,4L));
setkey(A,b,time)
setkey(B,b,time)
library(rbenchmark)
benchmark(llorJoin(A,B),B[A,roll=T],replications=10)
test replications elapsed relative user.self sys.self user.child sys.child
1 llorJoin(A, B) 10 0.045 1 0.048 0 0 0
2 B[A, roll = T] 10 0.009 1 0.008 0 0 0
b time e d
1: a 2013-01-12 09:01:01 1 1.0
2: a 2013-01-12 09:01:02 2 1.9
3: a 2013-01-12 09:01:04 NA 2.0
4: a 2013-01-12 09:01:05 NA 1.8
5: b 2013-01-12 09:01:02 NA 5.0
6: c 2013-01-12 09:01:01 NA 4.1
7: c 2013-01-12 09:01:01 NA 4.2
So, as a comparadion, compared to the original data 5 times faster.
source to share
roll
can do nocb for a long time. Updating this answer so that # 615 can be closed.
You don't need to install keys anymore. Instead, you can specify the columns to be concatenated using an argument on=
(implemented in v1.9.6
). With these two functions, the task can be accomplished as follows:
require(data.table) # v1.9.6+
A[B, on=c("b", "time"), roll=-Inf]
# time b e d
# 1: 2015-10-11 10:01:01 a 1 1.0
# 2: 2015-10-11 10:01:02 a 2 1.9
# 3: 2015-10-11 10:01:04 a NA 2.0
# 4: 2015-10-11 10:01:05 a NA 1.8
# 5: 2015-10-11 10:01:02 b NA 5.0
# 6: 2015-10-11 10:01:01 c NA 4.1
# 7: 2015-10-11 10:01:01 c NA 4.2
What is it.
You are approaching the fastest path without change data.table
. The following feature request was sent some time ago:
FR # 2300 Add back and back to roll = TRUE
I have added a link to this question. You can search for a list of functions in R-Forge. In this case, words like "roll", "forward" and "backwards" all find it. You may need 4 or 5 searches to confirm that a bug or feature request has not yet been filed.
I will most likely need to implement this function request (only need a few lines inside) than give it a try and will provide you with the fastest workaround.
source to share