Data.table group based outer join in R

I have data with the following columns:

CaseID, time, value.

Time columns are not equal at equal intervals 1. I am trying to add missing times with NA for other columns except CaseID.

Case Value  Time
1    100    07:52:00
1    110    07:53:00
1    120    07:55:00
2    10     08:35:00
2    11     08:36:00
2    12     08:38:00

      

Desired output:

Case Value  Time
1    100    07:52:00
1    110    07:53:00
1    NA     07:54:00
1    120    07:55:00
2    10     08:35:00
2    11     08:36:00
2    NA     08:37:00
2    12     08:38:00

      

I tried dt[CJ(unique(CaseID),seq(min(Time),max(Time),"min"))]

it but it gave the following error:

Error in vecseq(f__, len__, if (allow.cartesian || notjoin) NULL else as.integer(max(nrow(x),  : 
  Join results in 9827315 rows; more than 9620640 = max(nrow(x),nrow(i)). Check for duplicate key values in i, each of which join to the same group in x over and over again. If that ok, try including `j` and dropping `by` (by-without-by) so that j runs for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Qaru and datatable-help for advice.

      

I cannot get it to work. Any help would be appreciated.

+3


source to share


1 answer


Like this??

dt[,Time:=as.POSIXct(Time,format="%H:%M:%S")]
result <- dt[,list(Time=seq(min(Time),max(Time),by="1 min")),by=Case]
setkey(result,Case,Time)
setkey(dt,Case,Time)
result <- dt[result][,Time:=format(Time,"%H:%M:%S")]
result
#    Case Value     Time
# 1:    1   100 07:52:00
# 2:    1   110 07:53:00
# 3:    1    NA 07:54:00
# 4:    1   120 07:55:00
# 5:    2    10 08:35:00
# 6:    2    11 08:36:00
# 7:    2    NA 08:37:00
# 8:    2    12 08:38:00

      




Another way:

dt[, Time := as.POSIXct(Time, format = "%H:%M:%S")]
setkey(dt, Time)
dt[, .SD[J(seq(min(Time), max(Time), by='1 min'))], by=Case]

      

We group by occasion and join Time

each group using .SD

(hence setting the key to Time

). From here you can use format()

as shown above.

+3


source







All Articles