Object size mismatch
I'm trying to figure out why some arrays that I save as .rda seem to consume more memory than others of equal size. Below are two objects: x and y of the same size, type and size. When I save each one, one is 41MB and the other is 6MB. Can anyone think of a reason why this might be happening?
> dim(x)
[1] 71 14 10000
> dim(y)
[1] 71 14 10000
> class(x)
[1] "array"
> class(y)
[1] "array"
> object.size(y)
79520208 bytes
> object.size(x)
79520208 bytes
source to share
If you save using the commands save
or saveRDS
, the default is compression. If you have different content in vectors, they will compress differently ...
Try save
with compress=FALSE
and compare again ...
In the example below, the difference in file size is 700x:
set.seed(42)
x <- runif(1e6) # random values should not compress well...
y <- rep(0, 1e6) # zeroes should compress very well...
object.size(x) # 8000040 bytes
object.size(y) # 8000040 bytes
save('x', file='x.rds')
save('y', file='y.rds')
file.info(c('x.rds', 'y.rds'))$size
#[1] 5316773 7838
save('x', file='x.rds', compress=FALSE)
save('y', file='y.rds', compress=FALSE)
file.info(c('x.rds', 'y.rds'))$size
#[1] 8000048 8000048
source to share
Both can be arrays of characters or lists or data. Or one could be a character (one or two bytes would be the minimum element size, and otehr could be numeric (8 bytes per element), and the larger one could be large character elements ..... or a lot of other possibilities. A few similar results to you:
x <- array(runif( 71* 14 *10000), dim = c(71 , 14, 10000) )
save(x, file="test.rda")
object.size(x)
# 79520208 bytes and the file is over 50 MB
x <- array(sample(letters, 71* 14 *10000, replace=TRUE), dim = c(71 , 14, 10000) )
save(x, file="test2.rda")
object.size(x)
# 79521456 bytes and the file is around 8 MB
source to share