Difference between runif and sample in R?
Consider the following code and output:
> set.seed(1)
> round(runif(10,1,100))
[1] 27 38 58 91 21 90 95 66 63 7
> set.seed(1)
> sample(1:100, 10, replace=TRUE)
[1] 27 38 58 91 21 90 95 67 63 7
This suggests that when asked to do the same, the 2 functions give almost the same result (although I wonder which round
gives the same result, not floor
either ceiling
). The main differences are in the default settings, and if you don't change these defaults, then both will give something called uniform (although it sample
will count as a discrete uniform and default without replacement).
Edit
A more correct comparison:
> ceiling(runif(10,0,100))
[1] 27 38 58 91 21 90 95 67 63 7
instead of round
.
We can even go a step higher:
> set.seed(1)
> tmp1 <- sample(1:100, 1000, replace=TRUE)
> set.seed(1)
> tmp2 <- ceiling(runif(1000,0,100))
> all.equal(tmp1,tmp2)
[1] TRUE
Of course, if the argument probs
for is used sample
(and not all values ββare equal), then it will no longer be uniform.
source to share
sample()
runs faster than ceiling(runif())
this is useful to know if you are running many simulations or boot files.
Raw time test script that time tests 4 equivalent scenarios:
n<- 100 # sample size
m<- 10000 # simulations
system.time(sample(n, size=n*m, replace =T)) # faster than ceiling/runif
system.time(ceiling(runif(n*m, 0, n)))
system.time(ceiling(n * runif(n*m)))
system.time(floor(runif(n*m, 1, n+1)))
Proportional time advantage is increased with n and m, but watch out that you don't fill up memory!
BTW Do not use round()
to convert uniformly spaced contiguous values ββto uniformly spaced integer as terminal values ββare only sampled half the time they should.
source to share