Randomly select numbers in sequence and store as observations based on predefined frequency distributions

Question

Randomly select numbers in sequence and store as observations based on predefined frequency distributions

I would like to randomly select numbers from a 1: 8 sequence and store the selected numbers as observations of a new variable in the SAS dataset. Each number in the 1 through 8 will have the same chance to be chosen (0.125). So, once a new variable is generated and I run proc freq on the variable, I get a close frequency distribution of about 12.5% for each number in the sequence.

The R equivalent is like this with the sample () function:

x <- sample(1:8, 1000, replace=T, 
                       prob=c(.125, .125, .125, .125, .125, .125, .125, .125))

But how can I do this in SAS? Many thanks!

+3

sas

QY Luo 13 Aug 14 at 18:59

source to share

1 answer

Joe · Accepted Answer · 2014-08-13T21:52:29+0000

SAS has a rand function that can produce any of several distributions. The distribution is uniform

similar to what you want. This produces 0: 1, so you just change it to 1: 8.

data want;
 call streaminit(7);  *initialize random stream, pick whatever positive seed you want;
 do _n_=1 to 1000; *do 1000 times;
   x = ceil(rand('Uniform')*8);
   output;
 end;
run;

Another method is the Table method, which is more directly like a function r

.

data want;
 call streaminit(7);
 do _n_ = 1 to 1000;
  x = rand('Table',.125,.125,.125,.125,.125,.125,.125,.125);
  output;
 end;
run;

proc freq data=want;
table x;
run;

However, in this case, uniform

must do the same.

Note that this (Uniform) method is very slightly biased at the top end: since it cannot produce 1, 8, it will happen very slightly less frequently than 1 through 7. (1 - 0 <x <= 1, 5 is 4 < x <5, but 8 is 7 <x <8). If you produce 1000 numbers that don't have a significant impact on things (we're considering a range of 2 ^ 63 numbers, so missing 8 will be extremely rare), but if you produce a lot of numbers (on the order of 1e15 or so), it starts to notice. and the table method beats - or uses 9 instead of 8 and discards 9s.

Randomly select numbers in sequence and store as observations based on predefined frequency distributions

More articles: