Size of nested or inactive (neat) data.frame?
This question is using a data.frame that contains list-columns (nested). I was wondering why / if there is an advantage to this. I assumed you want to minimize the amount of memory that each table uses ... But when I checked I was surprised:
Compare table sizes in nested and optional format:
1. Create nested / neat versions of 2-col and 5-col data.frame:
library(pryr) library(dplyr) library(tidyr) library(ggvis) n <- 1:1E6 df <- data_frame(id = n, vars = lapply(n, function(x) x <- sample(letters,sample(1:26,1)))) dfu <- df %>% unnest(vars) df_morecols <- data_frame(id = n, other1 = n, other2 = n, other3 = n, vars = lapply(n, function(x) x <- sample(letters,sample(1:26,1)))) dfu_morecols <- df_morecols %>% unnest(vars)
they look like this:
head(df)head(dfu) head(df_morecols) head(dfu_morecols)
2. Calculate the size of the object and the size of the graph
170 MB versus 162 MB for nested and neat 2-col df
170 MB versus 324 MB for nested and neat 5-col df
col_sizes <- sapply(c(df,dfu,df_morecols,dfu_morecols),object_size) col_names <- names(col_sizes) parent_obj <- c(rep(c('df','dfu'),each = 2), rep(c('df_morecols','dfu_morecols'),each = 5)) res <- data_frame(parent_obj,col_names,col_sizes) %>% unite(elementof, parent_obj,col_names, remove = F)
3. Sizes of column columns, colored by the parent object:
res %>% ggvis(y = ~elementof, x = ~0, x2 = ~col_sizes, fill = ~parent_obj) %>% layer_rects(height = band())
- What explains the smaller size of a neat 2-col df versus a nested one?
- Why doesn't this effect change for 5-col df?
source to share
No one has answered this question yet
See similar questions: