Vectorizing R-loop for better performance
I have a problem to find a vectorization representation for a particular loop in R. My goal is to improve the performance of the loop as it has to be executed thousands of times in the algorithm.
I want to find the position of the lowest value in a specific section of an array defined by a Level vector for each row.
Example:
Level = c(2,3)
Let the first row of the array the X: c(2, -1, 3, 0.5, 4)
.
Searching for the position of the lowest value in the 1:Level[1]
string range (that is (2, -1)
), I get 2 because -1 <2 and -1 stands at the second position of the string. Then, looking for the position of the lowest value in the second range (Level[1]+1):(Level[1]+Level[2])
(that is (3, 0.5, 4)
), I get 4, since 0.5 <3 <4 and 0.5 is in the fourth position of the row.
I need to perform this over every row in the array.
My solution to the problem works like this:
Level = c(2,3,3) #elements per section, here: 3 sections with 2,3 and 3 levels
rows = 10 #number of rows in array X
X = matrix(runif(rows*sum(Level),-5,5),rows,sum(Level)) #array with 10 rows and sum(Level) columns, here: 8
Position_min = matrix(0,rows,length(Level)) #array in which the position of minimum values for each section and row are stored
for(i in 1:rows){
for(j in 1:length(Level)){ #length(Level) is number of intervals, here: 3
if(j == 1){coeff=0}else{coeff=1}
Position_min[i,j] = coeff*sum(Level[1:(j-1)]) + which(X[i,(coeff*sum(Level[1:(j-1)])+1):sum(Level[1:j])] == min(X[i,(coeff*sum(Level[1:(j-1)])+1):sum(Level[1:j])]))
}
}
It works great, but I would prefer a better performance solution. Any ideas?
source to share
Here is a "fully vectorized" solution without explicit loops:
findmins <- function(x, level) {
series <- rep(1:length(Level), Level)
x <- split(x, factor(series))
minsSplit <- as.numeric(sapply(x, which.min))
minsSplit + c(0, cumsum(level[-length(level)]))
}
Position_min_vectorized <- t(apply(X, 1, findmins, Level))
identical(Position_min, Position_min_vectorized)
## [1] TRUE
You can improve performance by turning your matrix into a list and then using parallel
mclapply()
:
X_list <- split(X, factor(1:nrow(X)))
do.call(rbind, parallel::mclapply(X_list, findmins, Level))
## [,1] [,2] [,3]
## 1 1 5 6
## 2 2 3 6
## 3 1 4 7
## 4 1 5 6
## 5 2 5 7
## 6 2 4 6
## 7 1 5 8
## 8 1 5 8
## 9 1 3 8
## 10 1 3 8
source to share