Difference between runif and sample in R?

In terms of a probability distribution, do they use? I know runif gives fractional numbers and sample gives whole numbers, but I'm wondering if sample can use "uniform probability distribution"?

+3


source to share


3 answers


Consider the following code and output:

> set.seed(1)
> round(runif(10,1,100))
 [1] 27 38 58 91 21 90 95 66 63  7
> set.seed(1)
> sample(1:100, 10, replace=TRUE)
 [1] 27 38 58 91 21 90 95 67 63  7

      

This suggests that when asked to do the same, the 2 functions give almost the same result (although I wonder which round

gives the same result, not floor

either ceiling

). The main differences are in the default settings, and if you don't change these defaults, then both will give something called uniform (although it sample

will count as a discrete uniform and default without replacement).

Edit

A more correct comparison:



> ceiling(runif(10,0,100))
 [1] 27 38 58 91 21 90 95 67 63  7

      

instead of round

.

We can even go a step higher:

> set.seed(1)
> tmp1 <- sample(1:100, 1000, replace=TRUE)
> set.seed(1)
> tmp2 <- ceiling(runif(1000,0,100))
> all.equal(tmp1,tmp2)
[1] TRUE

      

Of course, if the argument probs

for is used sample

(and not all values ​​are equal), then it will no longer be uniform.

+9


source


sample

fetch from a fixed set of inputs, and if an input of length -1 is passed as the first argument, integer output (s) are returned.

On the other hand, it runif

returns a sample from a real range.



 > sample(c(1,2,3), 1)
 [1] 2
 > runif(1, 1, 3)
 [1] 1.448551

      

+7


source


sample()

runs faster than ceiling(runif())

this is useful to know if you are running many simulations or boot files.

Raw time test script that time tests 4 equivalent scenarios:

n<- 100                     # sample size
m<- 10000                   # simulations
system.time(sample(n, size=n*m, replace =T))  # faster than ceiling/runif 
system.time(ceiling(runif(n*m, 0, n)))
system.time(ceiling(n * runif(n*m)))
system.time(floor(runif(n*m, 1, n+1)))

      

Proportional time advantage is increased with n and m, but watch out that you don't fill up memory!

BTW Do not use round()

to convert uniformly spaced contiguous values ​​to uniformly spaced integer as terminal values ​​are only sampled half the time they should.

0


source







All Articles