Search value for a dataframe in a range and returns another column
I have two dataframes and want to use the value in one ( DF1$pos
) to search across two columns in DF2 (DF2start, DF2end) and if it hits those numbers, returnDF2$name
DF1
ID pos name
chr 12
chr 542
chr 674
DF2
ID start end annot
chr 1 200 a1
chr 201 432 a2
chr 540 1002 a3
chr 2000 2004 a4
so in this example I would like DF1 to become
ID pos name
chr 12 a1
chr 542 a3
chr 674 a3
I've tried using merge and intersection, but don't know how to use the if
boolean operator in them.
Data frames should be encoded as follows:
DF1 <- data.frame(ID=c("chr","chr","chr"),
pos=c(12,542,672),
name=c(NA,NA,NA))
DF2 <- data.frame(ID=c("chr","chr","chr","chr"),
start=c(1,201,540,200),
end=c(200,432,1002,2004),
annot=c("a1","a2","a3","a4"))
source to share
Maybe you can use foverlaps
from data.table package.
library(data.table)
DT1 <- data.table(DF1)
DT2 <- data.table(DF2)
setkey(DT2, ID, start, end)
DT1[, c("start", "end") := pos] ## I don't know if there a way around this step...
foverlaps(DT1, DT2)
# ID start end annot pos i.start i.end
# 1: chr 1 200 a1 12 12 12
# 2: chr 540 1002 a3 542 542 542
# 3: chr 540 1002 a3 674 674 674
foverlaps(DT1, DT2)[, c("ID", "pos", "annot"), with = FALSE]
# ID pos annot
# 1: chr 12 a1
# 2: chr 542 a3
# 3: chr 674 a3
As @Arun mentioned in the comments, you can also use which = TRUE
in foverlaps
to retrieve the corresponding values:
foverlaps(DT1, DT2, which = TRUE)
# xid yid
# 1: 1 1
# 2: 2 3
# 3: 3 3
DT2$annot[foverlaps(DT1, DT2, which = TRUE)$yid]
# [1] "a1" "a3" "a3"
source to share
You can also use IRanges
source("http://bioconductor.org/biocLite.R")
biocLite("IRanges")
library(IRanges)
DF1N <- with(DF1, IRanges(pos, pos))
DF2N <- with(DF2, IRanges(start, end))
DF1$name <- DF2$annot[subjectHits(findOverlaps(DF1N, DF2N))]
DF1
# ID pos name
#1 chr 12 a1
#2 chr 542 a3
#3 chr 674 a3
source to share