Alternative to loops in R

Possible duplicate:
Speed ​​up looping in R

I have a few questions regarding loops. I know R is faster with vectorized computation, and I would like to modify the code below to take advantage of that. Looking at some of the other answers on the forum, the sapply function seems to be able to replace the inner for loop, but I am generating a vector of zeros for an error to occur. The Tao remains 1000, and I think this creates a problem.

My main concern is speed, since I need to create a loop around the whole algorithm and a graph in different V and n dimensions for further analysis.

thanks for the help

Alternative cycle

tao = 1000
L = (tao - 1)   
n = 10      
V = 5               
I = 10000                       
V_s = matrix(rnorm(I), I, 1)
V_b = matrix(rnorm(I), I, 1)

signal <- matrix(0, L, 1)  

for( j in (n:L)){

    sapply(((j-n+1):j),function (tao) signal[j] = signal[j] + abs(V_s[tao] - V_b[tao]))

    signal[j] = (signal[j] / (n * V) )

} 

      

Original loop

tao = 1000
L = (tao - 1)   
n = 10      
V = 5               
I = 10000                       
V_s = matrix(rnorm(I), I, 1)
V_b = matrix(rnorm(I), I, 1)

signal <- matrix(0, L, 1)  

for( j in (n:L)){

    for( tao in ((j-n+1):j))    {

        signal[j] = (signal[j] + abs(V_s[tao] - V_b[tao]))

    }
        signal[j] = (signal[j] / (n * V) )

}

      

+3


source to share


3 answers


Using filters, you can perform calculations even without any loop (and sapply

is nothing more than a hidden loop).

absdif <- abs(V_s - V_b)
signal <- filter(absdif[1:L], rep(1/(n*V), n), sides=1)
signal[is.na(signal)] <- 0

      

Understanding what is happening on the second line is not trivial unless you are using filters. Let's take a closer look:

First, we calculate the absolute differences V_s

and V_b

, which you often use in a loop. Then comes the filter. Your calculations are nothing more than a sum of past values n

at each time value j

. So we have something like

signal[j] <- sum(absdif[j-n+1:j])

      

This is exactly what convolution filters do — the summation of some values ​​— generally by multiplying with some weight. We choose 1/(n*V)

for all values as the weight , which corresponds to the normalization that you perform in your outer loop. The last argument sides=1

simply tells the filter to only accept values ​​from the past ( sides=2

would mean sum(absdif[(j-n/2):(j+n/2)])

).

The last line simply fills in the values NA

at the beginning (where the filter does not have enough data to calculate the sum - this is tantamount to skipping the first values n

).

Finally, for a while:

Your full-cycle solution:

   User      System       total 
  0.037       0.000       0.037 

      



Juba solution:

   User      System       total 
  0.007       0.000       0.008 

      

Solution using filters:

   User      System       total 
  0.000       0.000       0.001 

      

Note that the concept of filters is really well understood and can be done incredibly quickly.

Edit: As noted in ?filter

R does not use Fast Fourier Transform with the standard instruction filter

. FFT is usually the most efficient way to implement convolutions. However, even this can be done by replacing the filter command with

signal <- convolve(absdif[1:L], rep(1/(n*V), n), type='filter')

      

Note that the first entries are now n

stripped rather than set to NA

. However, the result is the same. The time is useless this time - the total time is less than the three-digit output system.time

... However, note the following note in the R help filter

:

convolve (, type = "filter") uses FFT for computation, and thus may be faster for long filters in one-dimensional series, but does not return time series (and therefore time alignment is unclear) to handle missing values. the filter is faster for a filter with a length of 100 by a length of 1000, for example

+12


source


Vecnimation calculations do not always mean using the * apply function.

For example, you can simplify and speed things up by replacing your second loop with vector indexing:

for(j in (n:L)){
  sel <- (j-n+1):j
  signal[j] <- sum(abs(V_s[sel] - V_b[sel])) / (n*V)
}

      

For this solution, the runtime on my system is:



utilisateur     système      écoulé 
      0.008       0.004       0.009 

      

Whereas for your loops for

this:

utilisateur     système      écoulé 
       0.06        0.00        0.06 

      

By the way, you shouldn't use a name tao

for two different things.

+3


source


Assuming your explicit loop is correct, try this:

 signal[j]<- signal[j] + 
              sapply((j-n+1):j, 
                   FUN = function(iter){ 
                           abs(V_s[iter] - V_b[iter])
                   }, V_s = V_s, V_b = V_b)

      

Note that sapply returns the absolute difference between the iterative indices between V_s and V_b. This is then added to signal [j]

0


source







All Articles