Performance: Multiple Call Global and Local Ads
Why it doesn't matter where to declare the function in R (both have almost the same performance):
library(microbenchmark)
f1 <- function() {
lapply(1:100000, function(x) {
fun <- function() 1:10000
fun()
})
}
f2 <- function() {
fun <- function() 1:10000
lapply(1:100000, function(x) {
fun()
})
}
microbenchmark(f1(), f2(), times = 10)
# Unit: milliseconds
# expr min lq mean median uq max neval
# f1() 456.6720 459.2856 563.0407 507.1933 629.0231 922.8278 10
# f2() 438.5753 445.2491 616.4615 548.6700 615.3313 1048.7325 10
Why the question arises, where to declare a variable in R (global declaration is much faster):
library(microbenchmark)
f1 <- function() {
lapply(1:100000, function(x) {
var <- 1:10000
var
})
}
f2 <- function() {
var <- 1:10000
lapply(1:100000, function(x) {
var
})
}
microbenchmark(f1(), f2(), times = 10)
# Unit: milliseconds
# expr min lq mean median uq max neval
# f1() 516.07492 567.71822 611.44760 630.57550 642.47586 701.3975 10
# f2() 49.30975 50.12807 72.44492 52.53448 58.85256 159.2140 10
Why am I getting these results? So it's best to avoid declaring variables inside a function if the function needs to call multiple times?
source to share
Function definition has negligible performance. The function body is only evaluated when the function is called.
microbenchmark(fun <- function() 1:10000,
fun <- function() 1:100000, times = 1000)
#Unit: nanoseconds
# expr min lq mean median uq max neval cld
# fun <- function() 1:10000 198 506 568.462 511.5 548 54620 1000 a
# fun <- function() 1:1e+05 199 504 570.826 511.0 551 18620 1000 a
If you repeat this definition 1e5 times, you will need about 50ms, which is due to the difference your tests show.
Creating and populating a large variable has much better performance:
microbenchmark(var <- 1:10000, times = 100)
#Unit: microseconds
# expr min lq mean median uq max neval
# var <- 1:10000 4.183 4.3305 4.92081 4.4135 4.538 15.283 100
Doing this 1e5 times is about 0.5s, which is due to the difference you compared.
Regarding your last question: Yes, at least if the variables are large.
source to share