Cumulative strings using c () in dplyr sum or sum
I want to concatenate some strings using c () like the aggregation function in dplyr. I tried the following first:
> InsectSprays$spray = as.character(InsectSprays$spray)
> dt = tbl_df(InsectSprays)
> dt %>% group_by(count) %>% summarize(c(spray))
Error: expecting a single value
But using the c () function in aggregate () works:
> da = aggregate(spray ~ count, InsectSprays, c)
> head(da)
count spray
1 0 C, C
2 1 C, C, C, C, E, E
3 2 C, C, D, E>
A search on stackoverflow hinted that instead of the c () function, using the paste () function with collapse would solve the problem:
dt %>% group_by(count) %>% summarize(s=paste(spray, collapse=","))
or
dt %>% group_by(count) %>% summarize(paste( c(spray), collapse=","))
My question is, why does the c () function work in aggregate () but not in dplyr's summaryize ()?
+3
source to share
1 answer
If you look closer, you might find that it c()
actually works (to a certain extent) when we use do()
. But in my opinion dplyr
does not currently allow this type of printing of lists
> InsectSprays$spray = as.character(InsectSprays$spray)
> dt = tbl_df(InsectSprays)
> doC <- dt %>% group_by(count) %>% do(s = c(.$spray))
> head(doC)
Source: local data frame [6 x 2]
count s
1 0 <chr[2]>
2 1 <chr[6]>
3 2 <chr[4]>
4 3 <chr[8]>
5 4 <chr[4]>
6 5 <chr[7]>
> head(doC)[[2]]
[[1]]
[1] "C" "C"
[[2]]
[1] "C" "C" "C" "C" "E" "E"
[[3]]
[1] "C" "C" "D" "E"
[[4]]
[1] "C" "C" "D" "D" "E" "E" "E" "E"
[[5]]
[1] "C" "D" "D" "E"
[[6]]
[1] "D" "D" "D" "D" "D" "E" "E"
+5
source to share