Error while creating parallel, binary (logistic) regression for a sparse matrix with glmnet

I want to perform parallel logical regression with the glmnet package. My data is a large sparse matrix (10 million observations and about 60k columns).

I did a little trial for a subset of the data (both observations and subsets of the columns) and it worked. The following code will be equivalent to what I am doing:

library(Matrix)
library(glmnet)
library(doMC)
#for reproducibility
set.seed(18)
#initialise cores
registerDoMC(cores=2)

sparseMat<-sparseMatrix(i=rep(1:50,4),j=sample(20,200,replace=TRUE),x=rep(1,200))
y<-as.factor(sample(2,50,replace=TRUE))

cvfit<-cv.glmnet(x=sparseMat,y=y,standardize=FALSE,family="binomial",alpha=0,parallel=TRUE)

      

However, when I enter the entire matrix, the process crashes, giving the following error message:

Error in max(sapply(outlist, function(obj) min(obj$lambda))) : 
invalid 'type' (list) of argument

      

I'm not sure what is causing the error and I don't know what the error message indicates.

I am using r in RStudio Linux server with 8 cores.

sessionInfo()

:

R version 3.1.2 (2014-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] doMC_1.3.3      iterators_1.0.7 glmnet_2.0-2    foreach_1.4.2   Matrix_1.1-5   

      

UPDATE I:

Since I cannot share the data that generates the error (privacy issues) and the reproductions that I was trying to generate the memory overflow, and not the error shown, I will reformulate the question:

Is my error message related to memory or something else?

Given the size of the dataset, a memory error is an option. However, the error message indicates an internal problem with more than one minimum within the lambda values. If this is not a memory issue how should I proceed, is there a workaround?

+3


source to share





All Articles