R RDS file size much larger than object size
I have an object x
that contains a list of matrices and lists of model objects lm
and gbm
etc. object.size(x)
shows only about 50 MB, but the resulting file is saveRDS
more than 5 times larger at over 250 MB. In general, what are some of the common reasons why an RDS file is much larger than its corresponding object size? And what can I do to minimize the mismatch between object size and file size?
EDIT:
I have truncated my original problem enough to give a reproducible example (I know the code is lapplying
above one element, but this is the example provided). There seems to be at least 2 problems:
1) The resulting RDS files are approximately 2-3 times their respective object size.
2) Objects from lapply
and mclapply
have almost the same object.size
, but the resulting file is 1.5 times larger for the object returned from mclapply
.
Since fit1
, and fit2
have almost the same size, check the size of their components within R does not seem very useful. Does anyone have any suggestion on how to debug this issue?
library(doParallel)
library(data.table)
library(caret)
fitModels <- function(dmy, dat, file.name) {
methods <- list(
list(method = 'knn', tuneLength = 1),
list(method = 'svmRadial', tuneLength = 1)
)
opts <- list(
form = as.formula('X1 ~ .'),
data = as.data.frame(dat),
trControl = trainControl(method = 'none', returnData = F)
)
fit <- mclapply(methods, function(x) do.call(train, c(opts, x)), mc.cores = 2)
saveRDS(fit, paste(file.name, 'rds', sep = '.'))
return(fit)
}
dat <- data.frame(matrix(rnorm(5e4), nrow = 1e3))
fit1 <- lapply(1, fitModels, dat, file.name = 'test1')
fit2 <- mclapply(1, fitModels, dat, file.name = 'test2', mc.cores = 1)
print(object.size(fit1))
print(object.size(fit2))
print(file.info('test1.rds')$size)
print(file.info('test2.rds')$size)
Output:
2148744 bytes
2149208 bytes
[1] 4659831
[1] 6968437
source to share
No one has answered this question yet
See similar questions:
or similar: