Store columns when using do
code
Suppose I have the following code (I know, instead of the second one do
I could use the simple one mutate
in this case (and skip rowwise()
), but it is not, as in my real code the second one is a do
little more complicated and calculates the model):
library(dplyr)
set.seed(1)
d <- data_frame(n = c(5, 1, 3))
e <- d %>% group_by(n) %>%
do(data_frame(y = rnorm(.$n), dat = list(data.frame(a = 1))))
e %>% rowwise() %>% do(data_frame(sum = .$y + .$n))
# Source: local data frame [9 x 1]
# Groups: <by row>
# # A tibble: 9 x 1
# sum
# * <dbl>
# 1 0.3735462
# 2 3.1836433
# 3 2.1643714
# 4 4.5952808
# 5 5.3295078
# 6 4.1795316
# 7 5.4874291
# 8 5.7383247
# 9 5.5757814
Problem
As you can see, the result contains only a column sum
.
Question
Is there a way to keep the original columns from e
without to explicitly specify them (like in e %>% do(data_frame(n = .$n, y = .$y, dat = .$dat, sum = .$y + .$n))
to dplyr
) or do I have to use purrrlyr::by_row
? (not that I don't like purrrlyr
*, I just wondered if there was a way to directly dplyr
do this that I might have overdone):
e %>% purrrlyr::by_row(function(x) x$y + x$n, .collate = "cols", .to = "sum")
*) Well, there is actually a catch with purrrlyr::by_row
:
e %>% purrrlyr::by_row(function(x) data_frame(sum = x$y + x$n, diff = x$y - x$n),
.collate ="cols")
Will create columns sum1
and diff1
, which I will need to rename again to get sum
and diff
, which adds another line of code.
source to share
I almost never use do
, but rather a combination of nest
, mutate
and map
.
It's a little tricky to tell how this would look like in your case, as your example doesn't seem to fully define your needs.
In the simplest case, you can specify the variables you want (for example, if they were lists of S3 objects):
mutate(e, sum = map2_dbl(y, n, `+`))
Or you can attach the data you want and then display all the data. For example:.
f <- e
f$r <- 1:nrow(e) # i.e. add some other variable, not necessarily row indices
f %>%
ungroup() %>% # e was still grouped
nest(n:dat) %>% # specify what you variables you need
mutate(sum = map_dbl(data, ~.$y + .$n)) %>% # map to data, use the same formula as in do
unnest() # unnest to get original columns back
Both leave the original columns intact.
For a simulation example, for example:
mtcars %>%
group_by(cyl) %>%
nest() %>%
mutate(model = map(data, ~lm(qsec ~ hp, .)),
coef = map_dbl(model, ~coef(.)[2])) %>%
unnest(data)
This will give you all the original data, but with added regression coefficients for each group. Before the exception, all models are in your data.frame as a list column.
source to share