Balancing (creating the same number of lines for each individual user)

Question

Balancing (creating the same number of lines for each individual user)

Given the data table as follows, id1

is a subject-level identifier, id2

is an identifier of a repeating measure within an object, X

are data variables, of which there are many. I want to balance the data in such a way that each person has the same number of rows (repeated measurements), which is max(DT[,.N,by=id1][,N])

, but where id1

and is id2

adjusted as needed, and X

the data values are replaced with NA

for these new rows.

Following:

DT = data.table(
id1 = c(1,1,2,2,2,3,3,3,3),
id2 = c(1,2,1,2,3,1,2,3,4),
X1 = letters[1:9],
X2 = LETTERS[1:9]
)
setkey(DT,id1)

It should look like:

DT = data.table(
id1 = c(1,1,1,1,2,2,2,2,3,3,3,3),
id2 = c(1,2,3,4,1,2,3,4,1,2,3,4),
X1 = c(letters[1:2],NA,NA,letters[3:5],NA,letters[6:9]),
X2 = c(LETTERS[1:2],NA,NA,LETTERS[3:5],NA,LETTERS[6:9])
)

How do you do it using data.table

? To do this, you must avoid loops as this dataset is huge. Is this a job for reshape2

?

+3

r data.table reshape2

FBC 12 Sep 14 at 16:25

source to share

2 answers

Here's a slight deviation from akrun's answer that is commonly used for this problem:

setkey(DT, id1, id2)
DT[CJ(unique(id1), unique(id2))]

+2

eddi 15 Sep 14 at 23:08

source to share

akrun · Accepted Answer · 2014-09-12T16:33:53+0000

You may try:

 DT2 <- CJ(id1=1:3, id2=1:4)
 merge(DT,DT2, by=c('id1', 'id2'), all=TRUE)

Balancing (creating the same number of lines for each individual user)

More articles: