Data.table group based outer join in R

Question

Data.table group based outer join in R

I have data with the following columns:

CaseID, time, value.

Time columns are not equal at equal intervals 1. I am trying to add missing times with NA for other columns except CaseID.

Case Value  Time
1    100    07:52:00
1    110    07:53:00
1    120    07:55:00
2    10     08:35:00
2    11     08:36:00
2    12     08:38:00

Desired output:

Case Value  Time
1    100    07:52:00
1    110    07:53:00
1    NA     07:54:00
1    120    07:55:00
2    10     08:35:00
2    11     08:36:00
2    NA     08:37:00
2    12     08:38:00

I tried dt[CJ(unique(CaseID),seq(min(Time),max(Time),"min"))]

it but it gave the following error:

Error in vecseq(f__, len__, if (allow.cartesian || notjoin) NULL else as.integer(max(nrow(x),  : 
  Join results in 9827315 rows; more than 9620640 = max(nrow(x),nrow(i)). Check for duplicate key values in i, each of which join to the same group in x over and over again. If that ok, try including `j` and dropping `by` (by-without-by) so that j runs for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Qaru and datatable-help for advice.

I cannot get it to work. Any help would be appreciated.

+3

r data.table

UoU Dec 18 14 at 20:25

source to share

1 answer

jlhoward · Answer 1 · 2014-12-18T21:15:19+0000

Like this??

dt[,Time:=as.POSIXct(Time,format="%H:%M:%S")]
result <- dt[,list(Time=seq(min(Time),max(Time),by="1 min")),by=Case]
setkey(result,Case,Time)
setkey(dt,Case,Time)
result <- dt[result][,Time:=format(Time,"%H:%M:%S")]
result
#    Case Value     Time
# 1:    1   100 07:52:00
# 2:    1   110 07:53:00
# 3:    1    NA 07:54:00
# 4:    1   120 07:55:00
# 5:    2    10 08:35:00
# 6:    2    11 08:36:00
# 7:    2    NA 08:37:00
# 8:    2    12 08:38:00

Another way:

dt[, Time := as.POSIXct(Time, format = "%H:%M:%S")]
setkey(dt, Time)
dt[, .SD[J(seq(min(Time), max(Time), by='1 min'))], by=Case]

We group by occasion and join Time

each group using .SD

(hence setting the key to Time

). From here you can use format()

as shown above.

Data.table group based outer join in R

More articles: