Collapse Collapse in R

I have searched as best I can, and part of my problem is that I'm really not sure what to ask. Here is my details and how I want it to end:

Now:

john    a Yes
john    b No
john    c No
Rebekah a Yes
Rebekah d No
Chase   c Yes
Chase   d No
Chase   e No
Chase   f No

      

How I would like:

john     a,b,c    Yes
Rebekah  a,d      Yes
Chase    c,d,e,f  Yes

      

Note that the third column says yes when it is the first row with that particular value in the 1st column. The third line is unnecessary, I just used it thinking that I would try to do it all with the if

and operators for

, but I thought it would be so inefficient. Is there a way to do this job efficiently?

+3


source to share


2 answers


Another option could be (using the data mentioned by @bgoldst)

library('dplyr')

out = df %>% 
      group_by(a) %>% 
      summarize(b = paste(unique(c(b)), collapse=","), c = "yes")

#> out
#Source: local data frame [3 x 3]

#        a       b   c
#1   Chase c,d,e,f yes
#2 Rebekah     a,d yes
#3    john   a,b,c yes

      



through data.table

out = setDT(df)[, .(b = paste(unique(b),  collapse=','), c = "yes"), by = .(a)]

#> out
#         a       b   c
#1:    john   a,b,c yes
#2: Rebekah     a,d yes
#3:   Chase c,d,e,f yes

      

+6


source


You can use by()

to do this:

df <- data.frame(a=c('john','john','john','Rebekah','Rebekah','Chase','Chase','Chase','Chase'), b=c('a','b','c','a','d','c','d','e','f'), c=c('Yes','No','No','Yes','No','Yes','No','No','No'), stringsAsFactors=F );
do.call(rbind,by(df,df$a,function(x) data.frame(a=x$a[1],b=paste0(x$b,collapse=','),c=x$c[1],stringsAsFactors=F)));
##               a       b   c
## Chase     Chase c,d,e,f Yes
## john       john   a,b,c Yes
## Rebekah Rebekah     a,d Yes

      

Edit: Here's a different approach using independent aggregations with tapply()

:



key <- unique(df$a);
data.frame(a=key,b=tapply(df$b,df$a,paste,collapse=',')[key],c=tapply(df$c,df$a,`[`,1)[key]);
##               a       b   c
## john       john   a,b,c Yes
## Rebekah Rebekah     a,d Yes
## Chase     Chase c,d,e,f Yes

      

Edit: And another approach, the merge()

result of multiple calls aggregate()

:

merge(aggregate(b~a,df,paste,collapse=','),aggregate(c~a,df,`[`,1));
##         a       b   c
## 1   Chase c,d,e,f Yes
## 2    john   a,b,c Yes
## 3 Rebekah     a,d Yes

      

+4


source







All Articles