R. Store rows from one data frame based on values ​​in the second

I have two data frames. One data frame has four columns, the fourth column contains a number that refers to the physical location.

The second data frame also has four columns. Here columns 2 and 3 refer to borders.

I am trying to store each line from a dataframe when the number given in V4 falls between V2 and V3 given on any line of the second dataframe. Therefore, if 62765 from data frame one V4 is between 20140803-20223538, 63549983-63556677, or 52236330-52315441, the whole row must be stored in data frame two in the example unless it is omitted.

I would also like to be able to do the opposite. Save each line when V4 is not between V2 and V3 in the second data frame. Any help here would be greatly appreciated.

data frame

V1 V2         V3  V4
10 rs11511647  0  62765
10 rs12218882  0  84172
10 rs10904045  0  84426
10 rs11252127  0  88087  

      

Data frame two

V1  V2         V3     V4
 7 20140803 20223538   7A5
19 63549983 63556677  A1BG
10 52236330 52315441  A1CF 

      

+3


source to share


3 answers


Here's a simple estimate:



# check whether values of df1$V4 are between df2$V2 and df2$V3
idx <- sapply(df1$V4, function(x) any(x >= df2$V2 & x <= df2$V3))

# remove rows
df1[idx, ]

# retain rows
df1[!idx, ]

      

+2


source


REVISED

Using @akrun's data and taking inspiration from @Sven Hohenstein's code, here's another approach.



df1 <- data.frame(
       V1 = c(10,10,10,10),
       V2 = c("rs11511647","rs12218882","rs10904045", "rs11252127"),
       V3 = c(0,0,0,0),
       V4 = c(62765, 63549985, 84426, 88087),
       stringsAsFactors=FALSE)

df2 <- data.frame(
       V1 = c(7, 19, 10),
       V2 = c(20140803, 63549983, 52236330),
       V3 = c(20223538, 63556677, 52315441),
       V4 = c("7A5", "A1BG", "A1CF"),
       stringsAsFactors=FALSE)

library(dplyr)

df1 %>%
    rowwise %>%
    mutate(test = ifelse(any(V4 >= df2$V2 & V4 <= df2$V3), 1, 0)) %>%
    filter(test == 1)

#  V1         V2 V3       V4 test
#1 10 rs12218882  0 63549985    1

      

+1


source


Here's another possibility

idx <- sapply(seq(nrow(df1)), function(y) {
    df1$V4[y] > df2[y,2] & df1$V4[y] < df2[y,3]
})
df1[match(TRUE, idx),]
#   V1         V2 V3       V4
# 2 10 rs12218882  0 63549985

      

0


source







All Articles