Randomly select numbers in sequence and store as observations based on predefined frequency distributions
I would like to randomly select numbers from a 1: 8 sequence and store the selected numbers as observations of a new variable in the SAS dataset. Each number in the 1 through 8 will have the same chance to be chosen (0.125). So, once a new variable is generated and I run proc freq on the variable, I get a close frequency distribution of about 12.5% for each number in the sequence.
The R equivalent is like this with the sample () function:
x <- sample(1:8, 1000, replace=T,
prob=c(.125, .125, .125, .125, .125, .125, .125, .125))
But how can I do this in SAS? Many thanks!
source to share
SAS has a rand function that can produce any of several distributions. The distribution is uniform
similar to what you want. This produces 0: 1, so you just change it to 1: 8.
data want;
call streaminit(7); *initialize random stream, pick whatever positive seed you want;
do _n_=1 to 1000; *do 1000 times;
x = ceil(rand('Uniform')*8);
output;
end;
run;
Another method is the Table method, which is more directly like a function r
.
data want;
call streaminit(7);
do _n_ = 1 to 1000;
x = rand('Table',.125,.125,.125,.125,.125,.125,.125,.125);
output;
end;
run;
proc freq data=want;
table x;
run;
However, in this case, uniform
must do the same.
Note that this (Uniform) method is very slightly biased at the top end: since it cannot produce 1, 8, it will happen very slightly less frequently than 1 through 7. (1 - 0 <x <= 1, 5 is 4 < x <5, but 8 is 7 <x <8). If you produce 1000 numbers that don't have a significant impact on things (we're considering a range of 2 ^ 63 numbers, so missing 8 will be extremely rare), but if you produce a lot of numbers (on the order of 1e15 or so), it starts to notice. and the table method beats - or uses 9 instead of 8 and discards 9s.
source to share