Gmdistribution for classification in Matlab

Question

Gmdistribution for classification in Matlab

Suppose I have two gmdistibution models which I got with

modeldata1=gmdistribution.fit(data1,1);
modeldata2=gmdistribution.fit(data2,1);

Now I have an unknown 'data' observation and I want to see if it belongs to data1

or data2

.

Based on my understanding of these features, nlogn output using posterior, cluster, or pdf commands would not be a good measure as I am comparing "data" with two different distributions.

What measure or inference should be used to find what is p(data|modeldata1) and p(data|modeldata2)

?

Many thanks,

+3

matlab

Louis Mar 30 12 at 22:26

source to share

1 answer

Vidar · Accepted Answer · 2012-03-30T23:45:58+0000

If I understand you correctly, you want to assign a new, unknown, datapoint for class 1 or class 2 with descriptors for each class (in this case the mean vector and covariance matrix) found by gmdistribution.fit.

After seeing this new datapoint call it x, you have to ask yourself what are p (modeldata1 | x) and p (modeldata2 | x) and ever one of them is the highest, you have to assign x to.

So how do you find them? You just apply Bayes' rule and choose which one is the biggest:

p(modeldata1 | x) = p(x|modeldata1)p(modeldata1)/p(x)
p(modeldata1 | x) = p(x|modeldata2)p(modeldata2)/p(x)

You don't need to compute p (x) here, as it is the same in every equation.

So now you estimate priors p (modeldata1) and p (modeldata2) by the number of training points from each class (or use some given information) and then compute

p(x|modeldata1)=1/((2pi)^d/2 * sqrt(det(Sigma1)))*exp(0.5*(x-mu1)/Sigma1*(x-mu1))

where d

is the dimension of your data, Sigma

is the correlation matrix, and mu

is the mean vector. Then you will be asked for p (data | modeldata1). (Remember to also use p (modeldata1) and p (modeldata2) when you do the classification).

I know this was a little unclear, but hopefully this can help you step in the right direction.

EDIT: Personally, I find a visualization like the one below (taken from Pattern Recognition by Theodoridis and Koutrumbas). Here you have two Gaussian mixtures with some feathers and different covariance matrices. The blue area is where you would select one class, while the gray area will select another. enter image description here

Gmdistribution for classification in Matlab

More articles: