Generate random samples from arbitrary discrete probability density function in Matlab

I have an arbitrary probability density function sampled as a matrix in Matlab, this means that for each x, y pair, the probability is stored in the matrix: A (x, y) = probability

This is a 100x100 matrix and I would like to be able to generate random samples of two dimensions (x, y) from this matrix, and also, if possible, to compute the mean and other moments of the PDF. I want to do this because after resampling, I want to fit the samples to an approximated Gaussian mixture model.

I've searched everywhere but I haven't found anything as specific as this. Hope you can help me.

Thank.

+3


source to share


2 answers


If you do have a discrete density function defined A

(as opposed to a continuous probability density function, which is simply described A

), you can "cheat" by turning your 2D problem into a 1D problem.

%define the possible values for the (x,y) pair
row_vals = [1:size(A,1)]'*ones(1,size(A,2));  %all x values
col_vals = ones(size(A,1),1)*[1:size(A,2)];  %all y values

%convert your 2D problem into a 1D problem
A = A(:);
row_vals = row_vals(:);
col_vals = col_vals(:);

%calculate your fake 1D CDF, assumes sum(A(:))==1
CDF = cumsum(A); %remember, first term out of of cumsum is not zero

%because of the operation we're doing below (interp1 followed by ceil)
%we need the CDF to start at zero
CDF = [0; CDF(:)];

%generate random values
N_vals = 1000;  %give me 1000 values
rand_vals = rand(N_vals,1);  %spans zero to one

%look into CDF to see which index the rand val corresponds to
out_val = interp1(CDF,[0:1/(length(CDF)-1):1],rand_vals); %spans zero to one
ind = ceil(out_val*length(A));

%using the inds, you can lookup each pair of values
xy_values = [row_vals(ind) col_vals(ind)];

      



I hope this helps!

Chip

+4


source


I don't believe Matlab has built-in functionality for generating multivariate random variables with arbitrary distribution. In fact, the same is true for one-dimensional random numbers. But while the latter can be easily generated based on the cumulative distribution function, the CDF does not exist for multivariate distributions, so generating such numbers is much more messy (the main problem is that 2 or more variables are correlated). So this part of your question goes far beyond the scope of this site.

Since half the answer is better than no answer, here's how you can calculate the mean and the highest moments numerically using matlab:

%generate some dummy input
xv=linspace(-50,50,101);
yv=linspace(-30,30,100);
[x y]=meshgrid(xv,yv);

%define a discretized two-hump Gaussian distribution
A=floor(15*exp(-((x-10).^2+y.^2)/100)+15*exp(-((x+25).^2+y.^2)/100));
A=A/sum(A(:)); %normalized to sum to 1

%plot it if you like
%figure;
%surf(x,y,A)

%actual half-answer starts here    

%get normalized pdf
weight=trapz(xv,trapz(yv,A));
A=A/weight; %A normalized to 1 according to trapz^2

%mean
mean_x=trapz(xv,trapz(yv,A.*x));
mean_y=trapz(xv,trapz(yv,A.*y));

      

So the point is, you can do a double integral on a rectangular grid using two successive calls trapz

. This allows you to compute the integral of any quantity that has the same shape as your mesh, but the disadvantage is that the vector components must be computed independently. If you only want to compute things that can be parameterized with x

and y

(which are naturally the same size as you grid), then you can do without the need for additional thinking.

You can also define a function to integrate:



function res=trapz2(xv,yv,A,arg)

if ~isscalar(arg) && any(size(arg)~=size(A))
    error('Size of A and var must be the same!')
end

res=trapz(xv,trapz(yv,A.*arg));

end

      

This way you can calculate things like

weight=trapz2(xv,yv,A,1);
mean_x=trapz2(xv,yv,A,x);

      

NOTE. The reason I used a 101x100 grid in the example is because the double call trapz

must be done in the correct order. If you rearrange xv

and yv

in calls, you get the wrong answer due to inconsistency with the definition A

, but it won't be obvious if it A

is square. I suggest avoiding symmetric values ​​during development.

+1


source







All Articles