# Balancing (creating the same number of lines for each individual user)

Given the data table as follows, `id1`

is a subject-level identifier, `id2`

is an identifier of a repeating measure within an object, `X`

are data variables, of which there are many. I want to balance the data in such a way that each person has the same number of rows (repeated measurements), which is `max(DT[,.N,by=id1][,N])`

, but where `id1`

and is `id2`

adjusted as needed, and `X`

the data values ​​are replaced with `NA`

for these new rows.

Following:

``````DT = data.table(
id1 = c(1,1,2,2,2,3,3,3,3),
id2 = c(1,2,1,2,3,1,2,3,4),
X1 = letters[1:9],
X2 = LETTERS[1:9]
)
setkey(DT,id1)
```

```

It should look like:

``````DT = data.table(
id1 = c(1,1,1,1,2,2,2,2,3,3,3,3),
id2 = c(1,2,3,4,1,2,3,4,1,2,3,4),
X1 = c(letters[1:2],NA,NA,letters[3:5],NA,letters[6:9]),
X2 = c(LETTERS[1:2],NA,NA,LETTERS[3:5],NA,LETTERS[6:9])
)
```

```

How do you do it using `data.table`

? To do this, you must avoid loops as this dataset is huge. Is this a job for `reshape2`

?

+3

source to share

You may try:

`````` DT2 <- CJ(id1=1:3, id2=1:4)
merge(DT,DT2, by=c('id1', 'id2'), all=TRUE)
```

```
+4

source

Here's a slight deviation from akrun's answer that is commonly used for this problem:

``````setkey(DT, id1, id2)
DT[CJ(unique(id1), unique(id2))]
```

```
+2

source

All Articles