Use data.table set () to convert all columns from integer to numeric

I am working with a data.table with 1900 columns and approximately 280,000 rows.

Currently the data is completely "integer", but I want it to be explicitly "numeric" so I can pass it to the bigcor () function later. Apparently bigcor () can only handle "numeric" and not "integer" ones.

I tried:

full.bind <- full.bind[,sapply(full.bind, as.numeric), with=FALSE]

      

Unfortunately, I am getting the error:

Error in `[.data.table`(full.bind, , sapply(full.bind, as.numeric), with = FALSE) : 
  j out of bounds

      

So, I tried using the data.table set () function, but I get the error:

Error in set(full.bind, value = as.numeric(full.bind)) : 
  (list) object cannot be coerced to type 'double'

      

I created a simple reproducible example. Note that the actual columns are NOT "a", "b", or "c"; they are extremely complex column names, so referencing a column alone is not an option.

dt <- data.table(a=1:10, b=1:10, c=1:10)

      

So my final questions:

1) Why is my sapply technique not working? (what is the "j out of bounds" error?) 2) Why is the set () method not working? (why can't the data.table data be bound to numeric?) 3) Does the bigcor () function require a numeric object or is there another problem?

+3


source to share


1 answer


Use .SD

and assignment by reference:

library(data.table)
dt <- data.table(a=1:10, b=1:10, c=1:10)
sapply(dt, class)
#        a         b         c 
#"integer" "integer" "integer"

dt[, names(dt) := lapply(.SD, as.numeric)]
sapply(dt, class)
#        a         b         c 
#"numeric" "numeric" "numeric"

      

set

only works for one column (note the documentation does not say which j

is optional) as each replacement column needs to be generated. You will need to iterate over the columns (using a loop for example for

) if you want to use it. This may be preferable because it requires less memory (additional memory is needed for one column, whereas the first approach requires additional memory for the entire data table.)



for (k in seq_along(dt)) set(dt, j = k, value = as.character(dt[[k]]))
sapply(dt, class)
#         a           b           c 
#"character" "character" "character"

      

However bigcor

(from package distribution) requires matrix input and a is data.table

not a matrix. So, your problem is not the column type, but you need to use as.matrix(dt)

.

+11


source







All Articles