Effectively create a breaking vector in R
I am looking into a way to efficiently create a mess (and conversely specific permutations) of a vector in R. As far as I have seen, there is no basic function that does this, and also not much here on SO.
The obvious start is sample
which creates a permutation of the vector. But I need this permutation to have no fixed points, hence be a vector violation. For a nice explanation of this topic see this Cross validated report .
This is my first approach:
derangr <- function(x){
while(TRUE){
xp <- sample(x)
if(sum(xp == x) == 0) break
}
return(xp)
}
So, inside the loop, while
I check if there is a fixed point between the vector x
and the given permutation x
called xp
. If not, I break the loop and return the vector.
As the results show, it works great:
> derangr(1:10)
[1] 4 5 6 10 7 2 1 9 3 8
> derangr(LETTERS)
[1] "C" "O" "L" "J" "A" "I" "Y" "M" "G" "T" "S" "R" "Z" "V" "N" "K" "D" "Q" "B" "H" "F" "E" "X" "W" "U" "P"
So, I'm wondering if there is a better way to do this, perhaps with while
some kind of vectorization substituted . I also want to keep an eye on scalability.
Here's microbenchmark
for both examples:
library(microbenchmark)
> microbenchmark(derangr(1:10),times = 10000)
Unit: microseconds
expr min lq mean median uq max neval
derangr(1:10) 8.359 15.492 40.1807 28.3195 49.4435 6866.453 10000
> microbenchmark(derangr(LETTERS),times = 10000)
Unit: microseconds
expr min lq mean median uq max neval
derangr(LETTERS) 24.385 31.123 34.75819 32.4475 34.3225 10200.17 10000
The same question applies to the converse, making permutations with a given number of fixed points n
:
arrangr <- function(x,n){
while(TRUE){
xp <- sample(x)
if(sum(xp == x) == n) break
}
return(xp)
}
source to share
If you don't have only unique values, you can rebuild the index as and use it for a subset of the input vector in the new order. In this case, if you have, for example rep(LETTERS, 2)
, the first A
and the second A
will be interchangeable. The function derangr()
suggested in Q will change them as well.
derangr2 <- function(x){
ind <- seq_along(x)
while(TRUE){
indp <- sample(ind)
if(sum(indp == ind) == 0) break
}
return(x[indp])
}
Some test results:
microbenchmark(derangr(rep(LETTERS, 4)),
derangr2(rep(LETTERS, 4)), times = 1000)
# Unit: microseconds
# expr min lq mean median uq max neval
# derangr(rep(LETTERS, 4)) 6.258 113.4895 441.831094 251.724 549.384 5837.143 1000
# derangr2(rep(LETTERS, 4)) 6.542 7.3960 23.173800 12.800 22.755 4645.936 1000
However, if you only come across unique values, this approach will not do well.
microbenchmark(derangr(1:1000), derangr2(1:1000), times = 1000)
# Unit: microseconds
# expr min lq mean median uq max neval
# derangr(1:1000) 19.341 21.333 61.55154 40.959 78.0775 2770.382 1000
# derangr2(1:1000) 23.608 25.884 72.76647 46.079 84.1930 2674.243 1000
source to share