Object size mismatch

I'm trying to figure out why some arrays that I save as .rda seem to consume more memory than others of equal size. Below are two objects: x and y of the same size, type and size. When I save each one, one is 41MB and the other is 6MB. Can anyone think of a reason why this might be happening?

> dim(x)
[1]    71    14 10000
> dim(y)
[1]    71    14 10000 
> class(x)
[1] "array"
> class(y)
[1] "array"  
> object.size(y)
79520208 bytes
> object.size(x)
79520208 bytes

      

+2


source to share


2 answers


If you save using the commands save

or saveRDS

, the default is compression. If you have different content in vectors, they will compress differently ...

Try save

with compress=FALSE

and compare again ...



In the example below, the difference in file size is 700x:

set.seed(42)
x <- runif(1e6)  # random values should not compress well...
y <- rep(0, 1e6) # zeroes should compress very well...
object.size(x) # 8000040 bytes
object.size(y) # 8000040 bytes

save('x', file='x.rds')
save('y', file='y.rds')
file.info(c('x.rds', 'y.rds'))$size
#[1] 5316773    7838

save('x', file='x.rds', compress=FALSE)
save('y', file='y.rds', compress=FALSE)
file.info(c('x.rds', 'y.rds'))$size
#[1] 8000048 8000048

      

+5


source


Both can be arrays of characters or lists or data. Or one could be a character (one or two bytes would be the minimum element size, and otehr could be numeric (8 bytes per element), and the larger one could be large character elements ..... or a lot of other possibilities. A few similar results to you:



x <- array(runif( 71* 14 *10000), dim = c(71 ,   14, 10000) )
 save(x, file="test.rda")
 object.size(x)
# 79520208 bytes  and the file is over 50 MB
x <- array(sample(letters, 71* 14 *10000, replace=TRUE), dim = c(71 ,   14, 10000) )
 save(x, file="test2.rda")
 object.size(x)
# 79521456 bytes   and the file is around 8 MB

      

+6


source







All Articles