Size of nested or inactive (neat) data.frame?

This question is using a data.frame that contains list-columns (nested). I was wondering why / if there is an advantage to this. I assumed you want to minimize the amount of memory that each table uses ... But when I checked I was surprised:

Compare table sizes in nested and optional format:

1. Create nested / neat versions of 2-col and 5-col data.frame:

    library(pryr)
    library(dplyr)
    library(tidyr)
    library(ggvis)
    n <- 1:1E6
    df <- data_frame(id = n, vars = lapply(n, function(x)  x <- sample(letters,sample(1:26,1))))
    dfu <- df %>% unnest(vars)
    df_morecols <- data_frame(id = n, other1 = n, other2 = n, other3 = n,
                     vars = lapply(n, function(x)  x <- sample(letters,sample(1:26,1))))
    dfu_morecols <- df_morecols %>% unnest(vars)

      

they look like this:

    head(df)
    #> Source: local data frame [6 x 2]

    #>   id      vars
    #> 1  1 <chr[16]>
    #> 2  2  <chr[4]>
    #> 3  3 <chr[26]>
    #> 4  4  <chr[9]>
    #> 5  5 <chr[11]>
    #> 6  6 <chr[18]>

    head(dfu)
    #> Source: local data frame [6 x 2]

    #>   id vars
    #> 1  1    k
    #> 2  1    d
    #> 3  1    s
    #> 4  1    j
    #> 5  1    m
    #> 6  1    t

    head(df_morecols)
    #> Source: local data frame [6 x 5]

    #>   id other1 other2 other3      vars
    #> 1  1      1      1      1  <chr[4]>
    #> 2  2      2      2      2 <chr[22]>
    #> 3  3      3      3      3 <chr[24]>
    #> 4  4      4      4      4  <chr[6]>
    #> 5  5      5      5      5 <chr[15]>
    #> 6  6      6      6      6 <chr[11]>

    head(dfu_morecols)
    #> Source: local data frame [6 x 5]

    #>   id other1 other2 other3 vars
    #> 1  1      1      1      1    r
    #> 2  1      1      1      1    p
    #> 3  1      1      1      1    s
    #> 4  1      1      1      1    w
    #> 5  2      2      2      2    l
    #> 6  2      2      2      2    j

      

2. Calculate the size of the object and the size of the graph

from: lapply(list(df,dfu,df_morecols,dfu_morecols),object_size)

170 MB versus 162 MB for nested and neat 2-col df
  170 MB versus 324 MB for nested and neat 5-col df

    col_sizes <- sapply(c(df,dfu,df_morecols,dfu_morecols),object_size)
    col_names <- names(col_sizes)
    parent_obj <- c(rep(c('df','dfu'),each = 2),
                    rep(c('df_morecols','dfu_morecols'),each = 5))
    res <- data_frame(parent_obj,col_names,col_sizes) %>% 
      unite(elementof, parent_obj,col_names, remove = F)

      

3. Sizes of column columns, colored by the parent object:

    res %>% 
      ggvis(y = ~elementof, x = ~0, x2 = ~col_sizes, fill = ~parent_obj) %>% 
      layer_rects(height = band())

      

plot of sizes

Questions:

  • What explains the smaller size of a neat 2-col df versus a nested one?
  • Why doesn't this effect change for 5-col df?
+3
performance memory r pryr


source to share


No one has answered this question yet

See similar questions:

five
Fastest way to filter content of list column data.frame in R / Rcpp

or similar:

879
How do I determine the size of my array in C?
756
Remove lines with all or some neural networks (missing values) in data.frame
587
In Java, what is the best way to determine the size of an object?
571
Fatal error: Allowed memory size 134217728 bytes exhausted (CodeIgniter + XML-RPC)
537
How do I determine the size of an object in Python?
443
Create an empty data.frame file
323
Converting data.frame columns from coefficients to symbols
250
How to rename a single column in data.frame?
192
How do I get the size of an object in memory?
180
Remove the whole column from data.frame in R



All Articles
Loading...
X
Show
Funny
Dev
Pics