Vectorizing R-loop for better performance

I have a problem to find a vectorization representation for a particular loop in R. My goal is to improve the performance of the loop as it has to be executed thousands of times in the algorithm.

I want to find the position of the lowest value in a specific section of an array defined by a Level vector for each row.

Example:

Level = c(2,3)

      

Let the first row of the array the X: c(2, -1, 3, 0.5, 4)

.

Searching for the position of the lowest value in the 1:Level[1]

string range (that is (2, -1)

), I get 2 because -1 <2 and -1 stands at the second position of the string. Then, looking for the position of the lowest value in the second range (Level[1]+1):(Level[1]+Level[2])

(that is (3, 0.5, 4)

), I get 4, since 0.5 <3 <4 and 0.5 is in the fourth position of the row.

I need to perform this over every row in the array.

My solution to the problem works like this:

Level = c(2,3,3)  #elements per section, here: 3 sections with 2,3 and 3 levels
rows = 10  #number of rows in array X
X = matrix(runif(rows*sum(Level),-5,5),rows,sum(Level))  #array with 10 rows and sum(Level) columns, here: 8
Position_min = matrix(0,rows,length(Level))  #array in which the position of minimum values for each section and row are stored
for(i in 1:rows){
 for(j in 1:length(Level)){            #length(Level) is number of intervals, here: 3
  if(j == 1){coeff=0}else{coeff=1}
  Position_min[i,j] = coeff*sum(Level[1:(j-1)]) + which(X[i,(coeff*sum(Level[1:(j-1)])+1):sum(Level[1:j])] == min(X[i,(coeff*sum(Level[1:(j-1)])+1):sum(Level[1:j])]))
  }
}

      

It works great, but I would prefer a better performance solution. Any ideas?

+3


source to share


2 answers


This will remove the outer layer of the loop:



Level1=c(0,cumsum(Level))
for(j in 1:(length(Level1)-1)){
    Position_min[,j]=max.col(-X[,(Level1[j]+1):Level1[j+1]])+(Level1[j])
}

      

+3


source


Here is a "fully vectorized" solution without explicit loops:

findmins <- function(x, level) {
    series <- rep(1:length(Level), Level)
    x <- split(x, factor(series))
    minsSplit <- as.numeric(sapply(x, which.min))
    minsSplit + c(0, cumsum(level[-length(level)]))
}

Position_min_vectorized <- t(apply(X, 1, findmins, Level))
identical(Position_min, Position_min_vectorized)
## [1] TRUE

      



You can improve performance by turning your matrix into a list and then using parallel

mclapply()

:

X_list <- split(X, factor(1:nrow(X)))
do.call(rbind, parallel::mclapply(X_list, findmins, Level))
##    [,1] [,2] [,3]
## 1     1    5    6
## 2     2    3    6
## 3     1    4    7
## 4     1    5    6
## 5     2    5    7
## 6     2    4    6
## 7     1    5    8
## 8     1    5    8
## 9     1    3    8
## 10    1    3    8

      

+3


source







All Articles