Merging large data.tables on character columns causes segfault
I am using R version 3.3.3 (although I replicated this issue to 3.4.0) and data.table
version 1.10.4 on Cygwin. ( Edit : The comments below suggest this may be Cygwin specific.) I need to concatenate two data tables (about 1 megabyte and 2000 rows) using an alphanumeric ID column. About three times out of four, I get a segfault in the call to the merge itself, or in the last call that modifies or prints the merged table. (I understand this as a result of lazy evaluation.)
This is a problem with the specifics of character columns; merging on whole columns works great. See this terminal session:
> library(data.table)
data.table 1.10.4 #[snipping rest of startup message]
> n <- 2e6 # Make this higher if you can't trigger a segfault yourself.
> a <- data.table(a=1:n, b=runif(n), c=runif(n))
> b <- data.table(a=1:n, x=runif(n), y=runif(n))
> head(merge(a, b)) # This works fine.
a b c x y
1: 1 0.6753597 0.08822928 0.7204507 0.71065772
2: 2 0.1898733 0.11883707 0.9820610 0.74329076
3: 3 0.3941039 0.57053921 0.3346781 0.22707652
4: 4 0.4564642 0.77429123 0.4924871 0.07743992
5: 5 0.9109421 0.79464586 0.2588091 0.82185820
6: 6 0.1805926 0.94213717 0.7426924 0.52522687
> a <- data.table(a=as.character(1:n), b=runif(n), c=runif(n))
> b <- data.table(a=as.character(1:n), x=runif(n), y=runif(n))
> head(merge(a, b))
*** caught segfault ***
address 0xffffffffffffffff, cause 'unknown'
Traceback:
1: `[.data.table`(x, i, , )
2: x[i, , ]
3: head.data.table(merge(a, b))
4: head(merge(a, b))
If a
and b
are data.frame
s, then merge()
do not segfault on character columns. Questions:
- Is this behavior documented or otherwise known?
- Is there a workaround to create a new id column, or cast back and forth to
data.frame
when I need to usemerge()
?
source to share
No one has answered this question yet