Unexpected behavior with foverlaps with floating point intervals

Here's an example where foverlaps(...)

it's like finding matches that don't overlap. Can anyone help me understand what I am doing wrong?

The problem in this post seems like a great opportunity to use foverlaps(...)

in the data.table package. Below is the data from this publication.

dinosaurs <- structure(list(GENUS = structure(1:3, .Label = c("Abydosaurus", "Achelousaurus", "Acheroraptor"), class = "factor"), ma_max = c(109, 84.9, 70.6), ma_min = c(94.3, 70.6, 66.043), ma_mid = c(101.65, 77.75, 68.3215)), .Names = c("GENUS", "ma_max", "ma_min", "ma_mid"), class = "data.frame", row.names = c(NA, -3L))
stages    <- structure(list(Stage = structure(c(13L, 19L, 17L, 21L, 1L, 4L, 6L, 8L, 16L, 14L, 20L, 7L, 23L, 12L, 5L, 3L, 2L, 10L, 22L, 11L, 18L, 9L, 15L), .Label = c("Aalenian", "Albian", "Aptian", "Bajocian", "Barremian", "Bathonian", "Berriasian", "Callovian", "Campanian", "Cenomanian", "Coniacian", "Hauterivian", "Hettangian", "Kimmeridgian", "Maastrichtian", "Oxfordian", "Pliensbachian", "Santonian", "Sinemurian", "Tithonian", "Toarcian", "Turonian", "Valanginian"), class = "factor"),ma_max = c(201.6, 197, 190, 183, 176, 172, 168, 165, 161, 156, 151, 145.5, 140, 136, 130, 125, 112, 99.6, 93.5, 89.3, 85.8, 83.5, 70.6), ma_min = c(197, 190, 183, 176, 172, 168, 165, 161, 156, 151, 145.5, 140, 136, 130, 125, 112, 99.6, 93.5, 89.3, 85.8, 83.5, 70.6, 66.5), ma_mid = c(199.3, 193.5, 186.5, 179.5, 174, 170, 166.5, 163, 158.5, 153.5, 148.25, 142.75, 138, 133, 127.5, 118.5, 105.8, 96.55, 91.4, 87.55, 84.65, 77.05, 68.05)), .Names = c("Stage", "ma_max", "ma_min", "ma_mid"), class = "data.frame", row.names = c(NA, -23L))
dinosaurs
#           GENUS ma_max ma_min   ma_mid
# 1   Abydosaurus  109.0 94.300 101.6500
# 2 Achelousaurus   84.9 70.600  77.7500
# 3  Acheroraptor   70.6 66.043  68.3215
head(stages)
#           Stage ma_max ma_min ma_mid
# 1    Hettangian  201.6    197  199.3
# 2    Sinemurian  197.0    190  193.5
# 3 Pliensbachian  190.0    183  186.5
# 4      Toarcian  183.0    176  179.5
# 5      Aalenian  176.0    172  174.0
# 6      Bajocian  172.0    168  170.0

      

The goal is to find the number of dinosaur genera that were present at each geological stage.

library(data.table)   # 1.9.4
setDT(dinosaurs)[,ma_mid:=NULL]
setDT(stages)[,ma_mid:=NULL]
setkey(dinosaurs,ma_min,ma_max)
foverlaps(stages,dinosaurs,type="any",nomatch=0)
#            GENUS ma_max ma_min         Stage i.ma_max i.ma_min
# 1:   Abydosaurus  109.0 94.300        Albian    112.0     99.6
# 2:   Abydosaurus  109.0 94.300    Cenomanian     99.6     93.5
# 3: Achelousaurus   84.9 70.600     Coniacian     89.3     85.8
# 4: Achelousaurus   84.9 70.600     Santonian     85.8     83.5
# 5:  Acheroraptor   70.6 66.043     Campanian     83.5     70.6
# 6: Achelousaurus   84.9 70.600     Campanian     83.5     70.6
# 7:  Acheroraptor   70.6 66.043 Maastrichtian     70.6     66.5
# 8: Achelousaurus   84.9 70.600 Maastrichtian     70.6     66.5

      

This is mostly correct, but look at line 3. It seems to be arguing that the Cenomanian 85.8 to 89.3 million years ago overlaps with Achelousaurus, which lived 70.6 to 84.9 million years ago. ... What am I missing?

+1


source to share


1 answer


In 1.9.5, I get the following:

#            GENUS ma_max ma_min         Stage i.ma_max i.ma_min
# 1:   Abydosaurus  109.0 94.300        Albian    112.0     99.6
# 2:   Abydosaurus  109.0 94.300    Cenomanian     99.6     93.5
# 3: Achelousaurus   84.9 70.600     Santonian     85.8     83.5
# 4:  Acheroraptor   70.6 66.043     Campanian     83.5     70.6
# 5: Achelousaurus   84.9 70.600     Campanian     83.5     70.6
# 6:  Acheroraptor   70.6 66.043 Maastrichtian     70.6     66.5
# 7: Achelousaurus   84.9 70.600 Maastrichtian     70.6     66.5

      



Most likely the floating point bug was fixed in 1.9.5 in this commit . It would be great if you could check this as well.

+2


source







All Articles