Store columns when using do

code

Suppose I have the following code (I know, instead of the second one do

I could use the simple one mutate

in this case (and skip rowwise()

), but it is not, as in my real code the second one is a do

little more complicated and calculates the model):

library(dplyr)
set.seed(1)
d <- data_frame(n = c(5, 1, 3))
e <- d %>% group_by(n) %>% 
    do(data_frame(y = rnorm(.$n), dat = list(data.frame(a = 1)))) 
e %>% rowwise() %>% do(data_frame(sum = .$y + .$n))

# Source: local data frame [9 x 1]
# Groups: <by row>

# # A tibble: 9 x 1
#         sum
# *     <dbl>
# 1 0.3735462
# 2 3.1836433
# 3 2.1643714
# 4 4.5952808
# 5 5.3295078
# 6 4.1795316
# 7 5.4874291
# 8 5.7383247
# 9 5.5757814

      

Problem

As you can see, the result contains only a column sum

.

Question

Is there a way to keep the original columns from e

without to explicitly specify them (like in e %>% do(data_frame(n = .$n, y = .$y, dat = .$dat, sum = .$y + .$n))

to dplyr

) or do I have to use purrrlyr::by_row

? (not that I don't like purrrlyr

*, I just wondered if there was a way to directly dplyr

do this that I might have overdone):

e %>% purrrlyr::by_row(function(x) x$y + x$n, .collate = "cols", .to = "sum")

      


*) Well, there is actually a catch with purrrlyr::by_row

:

e %>% purrrlyr::by_row(function(x) data_frame(sum = x$y + x$n, diff = x$y - x$n), 
                       .collate ="cols")

      

Will create columns sum1

and diff1

, which I will need to rename again to get sum

and diff

, which adds another line of code.

+3


source to share


1 answer


I almost never use do

, but rather a combination of nest

, mutate

and map

.

It's a little tricky to tell how this would look like in your case, as your example doesn't seem to fully define your needs.

In the simplest case, you can specify the variables you want (for example, if they were lists of S3 objects):

mutate(e, sum = map2_dbl(y, n, `+`))

      

Or you can attach the data you want and then display all the data. For example:.



f <- e
f$r <- 1:nrow(e) # i.e. add some other variable, not necessarily row indices

f %>%
  ungroup() %>%                               # e was still grouped
  nest(n:dat) %>%                               # specify what you variables you need
  mutate(sum = map_dbl(data, ~.$y + .$n)) %>% # map to data, use the same formula as in do
  unnest()                                    # unnest to get original columns back

      

Both leave the original columns intact.

For a simulation example, for example:

mtcars %>% 
  group_by(cyl) %>% 
  nest() %>% 
  mutate(model = map(data, ~lm(qsec ~ hp, .)),
         coef  = map_dbl(model, ~coef(.)[2])) %>% 
  unnest(data)

      

This will give you all the original data, but with added regression coefficients for each group. Before the exception, all models are in your data.frame as a list column.

+4


source







All Articles