Difference between runif and sample in R?
In terms of a probability distribution, do they use? I know runif gives fractional numbers and sample gives whole numbers, but I'm wondering if sample can use "uniform probability distribution"?
Consider the following code and output:
> set.seed(1)
> round(runif(10,1,100))
[1] 27 38 58 91 21 90 95 66 63 7
> set.seed(1)
> sample(1:100, 10, replace=TRUE)
[1] 27 38 58 91 21 90 95 67 63 7
This suggests that when asked to do the same, the 2 functions give almost the same result (although I wonder which round
gives the same result, not floor
either ceiling
). The main differences are in the default settings, and if you don't change these defaults, then both will give something called uniform (although it sample
will count as a discrete uniform and default without replacement).
Edit
A more correct comparison:
> ceiling(runif(10,0,100))
[1] 27 38 58 91 21 90 95 67 63 7
instead of round
.
We can even go a step higher:
> set.seed(1)
> tmp1 <- sample(1:100, 1000, replace=TRUE)
> set.seed(1)
> tmp2 <- ceiling(runif(1000,0,100))
> all.equal(tmp1,tmp2)
[1] TRUE
Of course, if the argument probs
for is used sample
(and not all values ββare equal), then it will no longer be uniform.
sample
fetch from a fixed set of inputs, and if an input of length -1 is passed as the first argument, integer output (s) are returned.
On the other hand, it runif
returns a sample from a real range.
> sample(c(1,2,3), 1)
[1] 2
> runif(1, 1, 3)
[1] 1.448551
sample()
runs faster than ceiling(runif())
this is useful to know if you are running many simulations or boot files.
Raw time test script that time tests 4 equivalent scenarios:
n<- 100 # sample size
m<- 10000 # simulations
system.time(sample(n, size=n*m, replace =T)) # faster than ceiling/runif
system.time(ceiling(runif(n*m, 0, n)))
system.time(ceiling(n * runif(n*m)))
system.time(floor(runif(n*m, 1, n+1)))
Proportional time advantage is increased with n and m, but watch out that you don't fill up memory!
BTW Do not use round()
to convert uniformly spaced contiguous values ββto uniformly spaced integer as terminal values ββare only sampled half the time they should.