Alternative to slower ifelse in R data table
I am writing a function that uses multiple ifelse to run a data table. Although I use datasheets for speed, multiple ifelse make my code slow and this function is for large dataset. So I was wondering if there is an alternative to iflese. One example iflese from function (there are about 15 iflese), in this example the flag is set to 1 if x is empty else 0.
dt<-dt[,flag:=ifelse(is.na(x)|!nzchar(x),1,0)]
My apologies if this is a duplicate question.
Thanks in advance.
source to share
The fastest approach will probably depend on what your data looks like. The ones in the comments are comparable for this example:
( twice
@DavidArenburg and onceadd
by @akrun pointed out. I'm not really sure how to compare them with replications
> 1, as the objects actually changed during the test.)
DT <- data.table(x=sample(c(NA,"",letters),1e8,replace=TRUE))
DT0 <- copy(DT)
DT1 <- copy(DT)
DT2 <- copy(DT)
DT3 <- copy(DT)
DT4 <- copy(DT)
DT5 <- copy(DT)
DT6 <- copy(DT)
DT7 <- copy(DT)
library(rbenchmark)
benchmark(
ifelse = DT0[,flag:=ifelse(is.na(x)|!nzchar(x),1L,0L)],
keyit = {
setkey(DT1,x)
DT1[,flag:=0L]
DT1[J(NA_character_,""),flag:=1L]
},
twiceby = DT2[, flag:= 0L][is.na(x)|!nzchar(x), flag:= 1L,by=x],
twice = DT3[, flag:= 0L][is.na(x)|!nzchar(x), flag:= 1L],
onceby = DT4[, flag:= +(is.na(x)|!nzchar(x)), by=x],
once = DT5[, flag:= +(is.na(x)|!nzchar(x))],
onceadd = DT6[, flag:= (is.na(x)|!nzchar(x))+0L],
oncebyk = {setkey(DT7,x); DT7[, flag:= +(is.na(x)|!nzchar(x)), by=x]},
replications=1
)[1:5]
# test replications elapsed relative user.self
# 1 ifelse 1 19.61 31.127 17.32
# 2 keyit 1 0.63 1.000 0.47
# 6 once 1 3.26 5.175 2.68
# 7 onceadd 1 3.24 5.143 2.88
# 5 onceby 1 1.81 2.873 1.75
# 8 oncebyk 1 0.91 1.444 0.82
# 4 twice 1 3.17 5.032 2.79
# 3 twiceby 1 3.45 5.476 3.16
Discussion. In this example, it keyit
is the fastest. However, it is also the most verbose and changes the collation of the table. Also, keyit
very specific to the OP's question (taking advantage of the fact that exactly two character values ββmatch the condition is.na(x)|!nzchar(x)
) and hence it might not be that good for other applications where he would need to write something like
keyit = { setkey(DT1,x) flagem = DT1[,some_other_condition(x),by=x][(V1)]$x DT1[,flag:=0L] DT1[J(flagem),flag:=1L] }
source to share