R data.table using lapply for functions defined outside
This question has to do with R-pass fixed columns for function binding in data.table and weighted means by group and column , but is slightly different.
I would like to have one fixed column interacting with all other columns in data.table
. A trivial example to illustrate:
DT <- data.table(y = rnorm(10), x1 = rnorm(10), x2 = rnorm(10))
DT[, lapply(c('x1', 'x2'), function(x) get(x) * y)]
Now, suppose the operation is much more complex than multiplication, so I would like to define a standalone function outside of scope data.table
:
fun <- function(x) {
return(get(x) * y)
}
DT[, lapply(c('x1', 'x2'), fun)]
Error in get(x) : object 'x1' not found
Obviously there is a problem with variable scoping as a function defined outside data.table
cannot see variables inside. Is there some clever trick to define the function outside data.table
and still use it lapply
?
source to share
You will be wrapping yourself in knots if you are trying to concatenate references by character string and named variables. (and also by referencing "global" variables within functions)
The easiest way is to determine where get
looking for x
, (and y
)
Here's a function rewritten so that you can tell it where to look.
fun <- function(x,y,wherex=parent.frame(),wherey=parent.frame()) {
return(get(x,wherex) * get(y,wherey))
}
data.table
checks the names present in j
and loads only the columns that are needed.
In your example, you are not using column names, so nothing is available.
If you include .SD
in the expression for j
, it will be loaded in all columns. You can use .SD
as arguments wherex
/ wherey
for the newly createdfun
DT[, lapply(c('x1', 'x2'), fun, y = 'y' , wherex=.SD, wherey=.SD)]
# V1 V2
# 1: -0.27871200 1.1943170
# 2: -0.68843421 -1.5719016
# 3: 1.06968681 2.8358612
# 4: 0.21201412 1.0127712
# 5: 0.05392450 0.2487873
# 6: 0.04473767 -0.1644542
# 7: 5.37851536 2.9710708
# 8: 0.23653388 0.9506559
# 9: 1.96364756 -1.4662968
# 10: -0.02458077 -0.1197023
Note that you don't really need to wrap this in [.data.table
results <- setDT(lapply(c('x1','x2'), fun, y='y', wherex=DT,wherey=DT))
will return the same results.
source to share