R passes data parameters. Through function calls
so if I have a data table defined as:
> dt <- data.table (x=c(1,2,3,4), y=c("y","n","y","m"), z=c("pickle",3,8,"egg"))
> dt
x y z
1: 1 y pickle
2: 2 n 3
3: 3 y 8
4: 4 m egg
And the variable
fn <- "z"
I get that I can pull the column out of the data.table like this:
> dt[,fn, with=FALSE]
What I don't know how to do is table.table, equivalent to the following:
> factorFunction <- function(df, fn) {
df[,fn] <- as.factor(df[,fn])
return(df)
}
If I set fn = "x" and factorFunction (data.frame (dt), fn) is called, it works fine.
So, I am trying to use it using data.table, but it doesn't work.
> factorFunction <- function(dt, fn) {
dt[,fn, with=FALSE] <- as.factor(dt[,fn, with=FALSE])
return(dt)
}
Error in sort.list (y): 'x' must be atomic for 'sort.list' Did you name "sort" in the list?
+3
source to share
3 answers
You may try
dt[,(fn):= factor(.SD[[1L]]),.SDcols=fn]
If there are multiple columns use lapply(.SD, factor)
Function wrapper
factorFunction <- function(df, fn) {
df[, (fn):= factor(.SD[[1L]]), .SDcols=fn]
}
str(factorFunction(dt, fn))
#Classes โdata.tableโ and 'data.frame': 4 obs. of 3 variables:
#$ x: num 1 2 3 4
#$ y: chr "y" "n" "y" "m"
#$ z: Factor w/ 4 levels "3","8","egg",..: 4 1 2 3
+4
source to share
I don't recommend this as it is very uniiomatic:
factorFunction <- function(df,col){
df[,col] <- factor(df[[col]])
df
}
On a positive note, it works in both R base and data.table
:
df <- setDF(copy(dt))
class(df[[fn]]) # character
df <- factorFunction(df,fn)
class(df[[fn]]) # factor
class(dt[[fn]]) # character
dt <- factorFunction(dt,fn)
class(dt[[fn]]) # factor
+2
source to share