Balancing (creating the same number of lines for each individual user)
Given the data table as follows, id1
is a subject-level identifier, id2
is an identifier of a repeating measure within an object, X
are data variables, of which there are many. I want to balance the data in such a way that each person has the same number of rows (repeated measurements), which is max(DT[,.N,by=id1][,N])
, but where id1
and is id2
adjusted as needed, and X
the data values ββare replaced with NA
for these new rows.
Following:
DT = data.table( id1 = c(1,1,2,2,2,3,3,3,3), id2 = c(1,2,1,2,3,1,2,3,4), X1 = letters[1:9], X2 = LETTERS[1:9] ) setkey(DT,id1)
It should look like:
DT = data.table( id1 = c(1,1,1,1,2,2,2,2,3,3,3,3), id2 = c(1,2,3,4,1,2,3,4,1,2,3,4), X1 = c(letters[1:2],NA,NA,letters[3:5],NA,letters[6:9]), X2 = c(LETTERS[1:2],NA,NA,LETTERS[3:5],NA,LETTERS[6:9]) )
How do you do it using data.table
? To do this, you must avoid loops as this dataset is huge. Is this a job for reshape2
?
source to share