Error while creating parallel, binary (logistic) regression for a sparse matrix with glmnet
I want to perform parallel logical regression with the glmnet package. My data is a large sparse matrix (10 million observations and about 60k columns).
I did a little trial for a subset of the data (both observations and subsets of the columns) and it worked. The following code will be equivalent to what I am doing:
library(Matrix)
library(glmnet)
library(doMC)
#for reproducibility
set.seed(18)
#initialise cores
registerDoMC(cores=2)
sparseMat<-sparseMatrix(i=rep(1:50,4),j=sample(20,200,replace=TRUE),x=rep(1,200))
y<-as.factor(sample(2,50,replace=TRUE))
cvfit<-cv.glmnet(x=sparseMat,y=y,standardize=FALSE,family="binomial",alpha=0,parallel=TRUE)
However, when I enter the entire matrix, the process crashes, giving the following error message:
Error in max(sapply(outlist, function(obj) min(obj$lambda))) :
invalid 'type' (list) of argument
I'm not sure what is causing the error and I don't know what the error message indicates.
I am using r in RStudio Linux server with 8 cores.
sessionInfo()
:
R version 3.1.2 (2014-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] doMC_1.3.3 iterators_1.0.7 glmnet_2.0-2 foreach_1.4.2 Matrix_1.1-5
UPDATE I:
Since I cannot share the data that generates the error (privacy issues) and the reproductions that I was trying to generate the memory overflow, and not the error shown, I will reformulate the question:
Is my error message related to memory or something else?
Given the size of the dataset, a memory error is an option. However, the error message indicates an internal problem with more than one minimum within the lambda values. If this is not a memory issue how should I proceed, is there a workaround?
source to share
No one has answered this question yet
Check out similar questions: