Finding values ββin one vector that is between values ββin another vector
I need help finding values ββin a vector that are in between key values ββnot included.
For example, with the following vectors x
andy
x <- c(2, 6, 10)
y <- c(7, 1, 9, 12, 4, 6, 3)
I would like to find all the values ββin y
that are between x
but not equal x
, so that the result would be
list(y[y > 2 & y < 6], y[y > 6 & y < 10])
# [[1]]
# [1] 4 3
#
# [[2]]
# [1] 7 9
So in the above result
- 3 and 4 are between 2 and 6
- 7 and 9 are between 6 and 10
- 12 is not in between anything, therefore excluded
- 6 is equal to 6 so it is also excluded
I've been working on this for a while now and I'm stumped. I would show you the code, but it's just ugly.
How to quickly find values ββin one vector between values ββin another vector?
source to share
Maybe this will work for you:
lapply(split(y[y > min(x) & y < max(x)],
findInterval(y[y > min(x) & y < max(x)], x)),
function(z) z[!z %in% x])
# $`1`
# [1] 4 3
#
# $`2`
# [1] 7 9
Of course, it would be better to keep DRY and subset "y" before splitting, for example using between
(or %between%
) from "data.table":
library(data.table)
Z <- y[y %between% range(x) & !y %in% x]
split(Z, findInterval(Z, x))
# $`1`
# [1] 4 3
#
# $`2`
# [1] 7 9
Update
For reference, all three options are still pretty fast:
set.seed(1)
x <- sort(sample(100000, 20, FALSE))
y <- sample(100000, 100000, TRUE)
AM <- function(x, y) {
Z <- y[y %between% range(x) & !y %in% x]
split(Z, findInterval(Z, x))
}
DA <- function(x, y) {
indx <- Map(function(x, z) x + seq_len(z), x[-length(x)], diff(x) - 1)
lapply(indx, function(x) y[y %in% x])
}
user <- function(x, y) {
m <- t(diff(sign(outer(x, y, "-"))) == 2)
split((m*y)[m], col(m)[m])
}
library(microbenchmark)
microbenchmark(AM(x, y), DA(x, y), user(x, y))
# Unit: milliseconds
# expr min lq mean median uq max neval
# AM(x, y) 22.58939 23.24731 26.29092 23.79639 25.64548 140.5610 100
# DA(x, y) 149.46997 157.48534 162.47526 160.01823 164.74851 287.0808 100
# user(x, y) 327.38835 437.44064 445.71955 446.65938 467.97784 637.3121 100
source to share
Another variant. I think you could use outer
and sign
; crossing columns, if there is a change from 1 to -1, then the value y
is within the range x
(that is, where consecutive columns are zero). However, the loop for retrieving values ββis a little confusing.
CHANGE @flodel offered a pleasant alternative in the comments
m <- t(diff(sign(outer(x, y, "-"))) == 2)
split((m*y)[m], col(m)[m])
Original
(o <- sign(outer(y, x, "-")))
# [,1] [,2] [,3]
# [1,] 1 1 -1
# [2,] -1 -1 -1
# [3,] 1 1 -1
# [4,] 1 1 1
# [5,] 1 -1 -1
# [6,] 1 0 -1
# [7,] 1 -1 -1
lapply(1:(length(x)-1), function(i) y[o[,i] + o[,i+1]==0])
# [[1]]
# [1] 4 3
#
# [[2]]
# [1] 7 9
source to share