Unexpected behavior with foverlaps with floating point intervals
Here's an example where foverlaps(...)
it's like finding matches that don't overlap. Can anyone help me understand what I am doing wrong?
The problem in this post seems like a great opportunity to use foverlaps(...)
in the data.table package. Below is the data from this publication.
dinosaurs <- structure(list(GENUS = structure(1:3, .Label = c("Abydosaurus", "Achelousaurus", "Acheroraptor"), class = "factor"), ma_max = c(109, 84.9, 70.6), ma_min = c(94.3, 70.6, 66.043), ma_mid = c(101.65, 77.75, 68.3215)), .Names = c("GENUS", "ma_max", "ma_min", "ma_mid"), class = "data.frame", row.names = c(NA, -3L))
stages <- structure(list(Stage = structure(c(13L, 19L, 17L, 21L, 1L, 4L, 6L, 8L, 16L, 14L, 20L, 7L, 23L, 12L, 5L, 3L, 2L, 10L, 22L, 11L, 18L, 9L, 15L), .Label = c("Aalenian", "Albian", "Aptian", "Bajocian", "Barremian", "Bathonian", "Berriasian", "Callovian", "Campanian", "Cenomanian", "Coniacian", "Hauterivian", "Hettangian", "Kimmeridgian", "Maastrichtian", "Oxfordian", "Pliensbachian", "Santonian", "Sinemurian", "Tithonian", "Toarcian", "Turonian", "Valanginian"), class = "factor"),ma_max = c(201.6, 197, 190, 183, 176, 172, 168, 165, 161, 156, 151, 145.5, 140, 136, 130, 125, 112, 99.6, 93.5, 89.3, 85.8, 83.5, 70.6), ma_min = c(197, 190, 183, 176, 172, 168, 165, 161, 156, 151, 145.5, 140, 136, 130, 125, 112, 99.6, 93.5, 89.3, 85.8, 83.5, 70.6, 66.5), ma_mid = c(199.3, 193.5, 186.5, 179.5, 174, 170, 166.5, 163, 158.5, 153.5, 148.25, 142.75, 138, 133, 127.5, 118.5, 105.8, 96.55, 91.4, 87.55, 84.65, 77.05, 68.05)), .Names = c("Stage", "ma_max", "ma_min", "ma_mid"), class = "data.frame", row.names = c(NA, -23L))
dinosaurs
# GENUS ma_max ma_min ma_mid
# 1 Abydosaurus 109.0 94.300 101.6500
# 2 Achelousaurus 84.9 70.600 77.7500
# 3 Acheroraptor 70.6 66.043 68.3215
head(stages)
# Stage ma_max ma_min ma_mid
# 1 Hettangian 201.6 197 199.3
# 2 Sinemurian 197.0 190 193.5
# 3 Pliensbachian 190.0 183 186.5
# 4 Toarcian 183.0 176 179.5
# 5 Aalenian 176.0 172 174.0
# 6 Bajocian 172.0 168 170.0
The goal is to find the number of dinosaur genera that were present at each geological stage.
library(data.table) # 1.9.4
setDT(dinosaurs)[,ma_mid:=NULL]
setDT(stages)[,ma_mid:=NULL]
setkey(dinosaurs,ma_min,ma_max)
foverlaps(stages,dinosaurs,type="any",nomatch=0)
# GENUS ma_max ma_min Stage i.ma_max i.ma_min
# 1: Abydosaurus 109.0 94.300 Albian 112.0 99.6
# 2: Abydosaurus 109.0 94.300 Cenomanian 99.6 93.5
# 3: Achelousaurus 84.9 70.600 Coniacian 89.3 85.8
# 4: Achelousaurus 84.9 70.600 Santonian 85.8 83.5
# 5: Acheroraptor 70.6 66.043 Campanian 83.5 70.6
# 6: Achelousaurus 84.9 70.600 Campanian 83.5 70.6
# 7: Acheroraptor 70.6 66.043 Maastrichtian 70.6 66.5
# 8: Achelousaurus 84.9 70.600 Maastrichtian 70.6 66.5
This is mostly correct, but look at line 3. It seems to be arguing that the Cenomanian 85.8 to 89.3 million years ago overlaps with Achelousaurus, which lived 70.6 to 84.9 million years ago. ... What am I missing?
source to share
In 1.9.5, I get the following:
# GENUS ma_max ma_min Stage i.ma_max i.ma_min
# 1: Abydosaurus 109.0 94.300 Albian 112.0 99.6
# 2: Abydosaurus 109.0 94.300 Cenomanian 99.6 93.5
# 3: Achelousaurus 84.9 70.600 Santonian 85.8 83.5
# 4: Acheroraptor 70.6 66.043 Campanian 83.5 70.6
# 5: Achelousaurus 84.9 70.600 Campanian 83.5 70.6
# 6: Acheroraptor 70.6 66.043 Maastrichtian 70.6 66.5
# 7: Achelousaurus 84.9 70.600 Maastrichtian 70.6 66.5
Most likely the floating point bug was fixed in 1.9.5 in this commit . It would be great if you could check this as well.
source to share