R data.table using lapply for functions defined outside

This question has to do with R-pass fixed columns for function binding in data.table and weighted means by group and column , but is slightly different.

I would like to have one fixed column interacting with all other columns in data.table

. A trivial example to illustrate:

DT <- data.table(y = rnorm(10), x1 = rnorm(10), x2 = rnorm(10))
DT[, lapply(c('x1', 'x2'), function(x) get(x) * y)]

      

Now, suppose the operation is much more complex than multiplication, so I would like to define a standalone function outside of scope data.table

:

fun <- function(x) {
    return(get(x) * y)
}
DT[, lapply(c('x1', 'x2'), fun)]
Error in get(x) : object 'x1' not found

      

Obviously there is a problem with variable scoping as a function defined outside data.table

cannot see variables inside. Is there some clever trick to define the function outside data.table

and still use it lapply

?

0


source to share


1 answer


You will be wrapping yourself in knots if you are trying to concatenate references by character string and named variables. (and also by referencing "global" variables within functions)

The easiest way is to determine where get

looking for x

, (and y

)

Here's a function rewritten so that you can tell it where to look.

fun <- function(x,y,wherex=parent.frame(),wherey=parent.frame()) {
    return(get(x,wherex) * get(y,wherey))
}

      

data.table

checks the names present in j

and loads only the columns that are needed.

In your example, you are not using column names, so nothing is available.



If you include .SD

in the expression for j

, it will be loaded in all columns. You can use .SD

as arguments wherex

/ wherey

for the newly createdfun

DT[, lapply(c('x1', 'x2'), fun, y = 'y' , wherex=.SD, wherey=.SD)]
 #              V1         V2
 #  1: -0.27871200  1.1943170
 #  2: -0.68843421 -1.5719016
 #  3:  1.06968681  2.8358612
 #  4:  0.21201412  1.0127712
 #  5:  0.05392450  0.2487873
 #  6:  0.04473767 -0.1644542
 #  7:  5.37851536  2.9710708
 #  8:  0.23653388  0.9506559
 #  9:  1.96364756 -1.4662968
 # 10: -0.02458077 -0.1197023

      

Note that you don't really need to wrap this in [.data.table

results <- setDT(lapply(c('x1','x2'), fun, y='y', wherex=DT,wherey=DT))

      

will return the same results.

+3


source







All Articles