Concatenate data table row for SD columns by group values
I have a large dataset with many variables that looks something like this:
> data.table(a=letters[1:10],b=LETTERS[1:10],ID=c(1,1,1,2,2,2,2,3,3,3))
a b ID
1: a A 1
2: b B 1
3: c C 1
4: d D 2
5: e E 2
6: f F 2
7: g G 2
8: h H 3
9: i I 3
10: j J 3
I want to concatenate (with a new string character in between) all the column values except the ID for each ID value, so the result should look like this:
a b ID
1: a A 1
b B
c C
2: d D 2
e E
f F
g G
3: h H 3
i I
j J
I found a link R Dataframe: Concatenate Rows Within Column, Row-by-Row, by Group , which talks about how to do this for one column, how to expand it all columns in .SD?
To make it clear, I changed the separator from \n
to ,
, and the result should look like this:
a b ID
1: a,b,c A,B,C 1
2: d,e,f,g D,E,F,G 2
3: h,i,j H,I,J 3
source to share
You can combine all columns when using lapply
.
dt[, lapply(.SD, paste0, collapse=" "), by = ID]
## ID a b
## 1: 1 a b c A B C
## 2: 2 d e f g D E F G
## 3: 3 h i j H I J
Using newlines as the ollapse argument instead " "
works, but does not print as you seem to expect in your desired output.
dt[, lapply(.SD, paste0, collapse="\n"), by = ID]
## ID a b
## 1: 1 a\nb\nc A\nB\nC
## 2: 2 d\ne\nf\ng D\nE\nF\nG
## 3: 3 h\ni\nj H\nI\nJ
As @Frank pointed out in the comments, the question was changed to ,
as a separator instead \n
. Of course, you can just change the argument collapse
to ","
. If you want to be the case as well ", "
then @DavidArenburg's solution is preferred.
dt[, lapply(.SD, paste0, collapse=","), by = ID]
dt[, lapply(.SD, toString), by = ID]
source to share