Finding values ​​in one vector that is between values ​​in another vector

I need help finding values ​​in a vector that are in between key values ​​not included.

For example, with the following vectors x

andy

x <- c(2, 6, 10)
y <- c(7, 1, 9, 12, 4, 6, 3)

      

I would like to find all the values ​​in y

that are between x

but not equal x

, so that the result would be

list(y[y > 2 & y < 6], y[y > 6 & y < 10])
# [[1]]
# [1] 4 3
#
# [[2]]
# [1] 7 9

      

So in the above result

  • 3 and 4 are between 2 and 6
  • 7 and 9 are between 6 and 10
  • 12 is not in between anything, therefore excluded
  • 6 is equal to 6 so it is also excluded

I've been working on this for a while now and I'm stumped. I would show you the code, but it's just ugly.

How to quickly find values ​​in one vector between values ​​in another vector?

+3


source to share


4 answers


Maybe this will work for you:

lapply(split(y[y > min(x) & y < max(x)], 
             findInterval(y[y > min(x) & y < max(x)], x)), 
       function(z) z[!z %in% x]) 
# $`1`
# [1] 4 3
# 
# $`2`
# [1] 7 9

      

Of course, it would be better to keep DRY and subset "y" before splitting, for example using between

(or %between%

) from "data.table":

library(data.table)
Z <- y[y %between% range(x) & !y %in% x]
split(Z, findInterval(Z, x))
# $`1`
# [1] 4 3
#
# $`2`
# [1] 7 9

      




Update

For reference, all three options are still pretty fast:

set.seed(1)
x <- sort(sample(100000, 20, FALSE))
y <- sample(100000, 100000, TRUE)

AM <- function(x, y) {
  Z <- y[y %between% range(x) & !y %in% x]
  split(Z, findInterval(Z, x))
}

DA <- function(x, y) {
  indx <- Map(function(x, z) x + seq_len(z), x[-length(x)], diff(x) - 1)
  lapply(indx, function(x) y[y %in% x])
}

user <- function(x, y) {
  m <- t(diff(sign(outer(x, y, "-"))) == 2)
  split((m*y)[m], col(m)[m])
}

library(microbenchmark)
microbenchmark(AM(x, y), DA(x, y), user(x, y))
# Unit: milliseconds
#        expr       min        lq      mean    median        uq      max neval
#    AM(x, y)  22.58939  23.24731  26.29092  23.79639  25.64548 140.5610   100
#    DA(x, y) 149.46997 157.48534 162.47526 160.01823 164.74851 287.0808   100
#  user(x, y) 327.38835 437.44064 445.71955 446.65938 467.97784 637.3121   100

      

+7


source


Here's a different approach



indx <- Map(function(x, z) x + seq_len(z), x[-length(x)], diff(x) - 1)
lapply(indx, function(x) y[y %in% x])
# [[1]]
# [1] 4 3
# 
# [[2]]
# [1] 7 9 

      

+5


source


Another variant. I think you could use outer

and sign

; crossing columns, if there is a change from 1 to -1, then the value y

is within the range x

(that is, where consecutive columns are zero). However, the loop for retrieving values ​​is a little confusing.

CHANGE @flodel offered a pleasant alternative in the comments

m <- t(diff(sign(outer(x, y, "-"))) == 2)
split((m*y)[m], col(m)[m])

      


Original

(o <- sign(outer(y, x, "-")))
#       [,1] [,2] [,3]
# [1,]    1    1   -1
# [2,]   -1   -1   -1
# [3,]    1    1   -1
# [4,]    1    1    1
# [5,]    1   -1   -1
# [6,]    1    0   -1
# [7,]    1   -1   -1

lapply(1:(length(x)-1), function(i) y[o[,i] + o[,i+1]==0])
# [[1]]
# [1] 4 3
# 
# [[2]]
# [1] 7 9

      

+4


source


Try:

z =list()
for(j in 1:(length(x)-1)) {
    v=NULL
    for(i in 1:length(y))   
       if(y[i]>x[j] && y[i]<x[j+1]) 
            v[length(v)+1]=y[i]
    z[[length(z)+1]] = v
}
z
[[1]]
[1] 4 3

[[2]]
[1] 7 9

      

0


source







All Articles