Concatenate data table row for SD columns by group values

I have a large dataset with many variables that looks something like this:

 > data.table(a=letters[1:10],b=LETTERS[1:10],ID=c(1,1,1,2,2,2,2,3,3,3))
     a b ID
  1: a A  1
  2: b B  1
  3: c C  1
  4: d D  2
  5: e E  2
  6: f F  2
  7: g G  2
  8: h H  3
  9: i I  3
 10: j J  3

      

I want to concatenate (with a new string character in between) all the column values ​​except the ID for each ID value, so the result should look like this:

     a b ID
  1: a A  1
     b B   
     c C   
  2: d D  2
     e E   
     f F   
     g G   
  3: h H  3
     i I   
     j J   

      

I found a link R Dataframe: Concatenate Rows Within Column, Row-by-Row, by Group , which talks about how to do this for one column, how to expand it all columns in .SD?

To make it clear, I changed the separator from \n

to ,

, and the result should look like this:

   a       b       ID
1: a,b,c   A,B,C   1
2: d,e,f,g D,E,F,G 2
3: h,i,j   H,I,J   3

      

+3


source to share


1 answer


You can combine all columns when using lapply

.

dt[, lapply(.SD, paste0, collapse=" "), by = ID]
##    ID       a       b
## 1:  1   a b c   A B C
## 2:  2 d e f g D E F G
## 3:  3   h i j   H I J

      

Using newlines as the ollapse argument instead " "

works, but does not print as you seem to expect in your desired output.



dt[, lapply(.SD, paste0, collapse="\n"), by = ID]
##    ID          a          b
## 1:  1    a\nb\nc    A\nB\nC
## 2:  2 d\ne\nf\ng D\nE\nF\nG
## 3:  3    h\ni\nj    H\nI\nJ

      

As @Frank pointed out in the comments, the question was changed to ,

as a separator instead \n

. Of course, you can just change the argument collapse

to ","

. If you want to be the case as well ", "

then @DavidArenburg's solution is preferred.

dt[, lapply(.SD, paste0, collapse=","), by = ID]
dt[, lapply(.SD, toString), by = ID]

      

+6


source







All Articles