Merging large data.tables on character columns causes segfault

I am using R version 3.3.3 (although I replicated this issue to 3.4.0) and data.table

version 1.10.4 on Cygwin. ( Edit : The comments below suggest this may be Cygwin specific.) I need to concatenate two data tables (about 1 megabyte and 2000 rows) using an alphanumeric ID column. About three times out of four, I get a segfault in the call to the merge itself, or in the last call that modifies or prints the merged table. (I understand this as a result of lazy evaluation.)

This is a problem with the specifics of character columns; merging on whole columns works great. See this terminal session:

> library(data.table)
data.table 1.10.4 #[snipping rest of startup message]
> n <- 2e6 # Make this higher if you can't trigger a segfault yourself.
> a <- data.table(a=1:n, b=runif(n), c=runif(n))
> b <- data.table(a=1:n, x=runif(n), y=runif(n))
> head(merge(a, b)) # This works fine.
   a         b          c         x          y
1: 1 0.6753597 0.08822928 0.7204507 0.71065772
2: 2 0.1898733 0.11883707 0.9820610 0.74329076
3: 3 0.3941039 0.57053921 0.3346781 0.22707652
4: 4 0.4564642 0.77429123 0.4924871 0.07743992
5: 5 0.9109421 0.79464586 0.2588091 0.82185820
6: 6 0.1805926 0.94213717 0.7426924 0.52522687
> a <- data.table(a=as.character(1:n), b=runif(n), c=runif(n))
> b <- data.table(a=as.character(1:n), x=runif(n), y=runif(n))
> head(merge(a, b))

 *** caught segfault ***
address 0xffffffffffffffff, cause 'unknown'

Traceback:
 1: `[.data.table`(x, i, , )
 2: x[i, , ]
 3: head.data.table(merge(a, b))
 4: head(merge(a, b))

      

If a

and b

are data.frame

s, then merge()

do not segfault on character columns. Questions:

  • Is this behavior documented or otherwise known?
  • Is there a workaround to create a new id column, or cast back and forth to data.frame

    when I need to use merge()

    ?
+3


source to share





All Articles