Performance: Multiple Call Global and Local Ads

Question

Performance: Multiple Call Global and Local Ads

Why it doesn't matter where to declare the function in R (both have almost the same performance):

library(microbenchmark)

f1 <- function() {
    lapply(1:100000, function(x) {
        fun <- function() 1:10000
        fun()
    })
}

f2 <- function() {
    fun <- function() 1:10000
    lapply(1:100000, function(x) {
        fun()
    })
}

microbenchmark(f1(), f2(), times = 10)

# Unit: milliseconds
# expr      min       lq     mean   median       uq       max neval
# f1() 456.6720 459.2856 563.0407 507.1933 629.0231  922.8278    10
# f2() 438.5753 445.2491 616.4615 548.6700 615.3313 1048.7325    10

Why the question arises, where to declare a variable in R (global declaration is much faster):

library(microbenchmark)

f1 <- function() {
    lapply(1:100000, function(x) {
        var <- 1:10000
        var
    })
}

f2 <- function() {
    var <- 1:10000
    lapply(1:100000, function(x) {
        var
    })
}

microbenchmark(f1(), f2(), times = 10)

# Unit: milliseconds
# expr       min        lq      mean    median        uq      max neval
# f1() 516.07492 567.71822 611.44760 630.57550 642.47586 701.3975    10
# f2()  49.30975  50.12807  72.44492  52.53448  58.85256 159.2140    10

Why am I getting these results? So it's best to avoid declaring variables inside a function if the function needs to call multiple times?

+3

performance r

Eldar Agalarov 24 May '15 at 16:00

source to share

1 answer

Roland · Accepted Answer · 2015-05-24T17:15:36+0000

Function definition has negligible performance. The function body is only evaluated when the function is called.

microbenchmark(fun <- function() 1:10000, 
               fun <- function() 1:100000, times = 1000)

#Unit: nanoseconds
#                      expr min  lq    mean median  uq   max neval cld
# fun <- function() 1:10000 198 506 568.462  511.5 548 54620  1000   a
# fun <- function() 1:1e+05 199 504 570.826  511.0 551 18620  1000   a

If you repeat this definition 1e5 times, you will need about 50ms, which is due to the difference your tests show.

Creating and populating a large variable has much better performance:

microbenchmark(var <- 1:10000, times = 100)
#Unit: microseconds
#           expr   min     lq    mean median    uq    max neval
# var <- 1:10000 4.183 4.3305 4.92081 4.4135 4.538 15.283   100

Doing this 1e5 times is about 0.5s, which is due to the difference you compared.

Regarding your last question: Yes, at least if the variables are large.

Performance: Multiple Call Global and Local Ads

More articles: