Install gaussians (or other distributions) on my data using python

I have a database of functions, 2D np.array (2000 samples and each sample contains 100 functions, 2000 X 100). I want to bind gaussian distributions to my database using python. My code is as follows:

data = load_my_data() # loads a np.array with size 2000x200
clf = mixture.GaussianMixture(n_components= 50, covariance_type='full')
clf.fit(data)

      

I'm not sure about the parameters, for example covariance_type, and how can I investigate if the match was appropriate or not.

EDIT: I am debugging the code to investigate what is going on with clf.means_ and, accordingly, it has created a matrix of n_components X size_of_features 50 X 20). Is there a way I can check that the fitting was successful, or for plotting the data? What are the alternatives to Gaussian blends (eg exponential blends, I can't find any implementation available)?

+3


source to share


4 answers


I think you are using sklearn package.

Once you fit, enter

print clf.means_

      



If it has an output then the data is set, if it causes errors not set.

Hope this helps you.

+3


source


You can downsize using PCA in 3D space (say) and then write down the means and data.



+1


source


It is always preferable to select a reduced set of candidates before trying to identify the distribution (in other words, use Cullen and Frey to reject unlikely candidates) and then go for a good match to pick the best result.

You can simply create a list of all available distributions in scipy. Example with two distributions and random data:

import numpy as np
import scipy.stats as st

data = np.random.random(10000)
#Specify all distributions here
distributions = [st.laplace, st.norm]
mles = []

for distribution in distributions:
    pars = distribution.fit(data)
    mle = distribution.nnlf(pars, data)
    mles.append(mle)

results = [(distribution.name, mle) for distribution, mle in 
zip(distributions, mles)]
best_fit = sorted(zip(distributions, mles), key=lambda d: d[1])[0]
print 'Best fit reached using {}, MLE value: {}'.format(best_fit[0].name, best_fit[1])

      

+1


source


I understand you can do a regression of two different distributions, more than fitting them to an arithmetic curve. If so, you might be interested in plotting one against the other and doing linear (or polynomial) regression by checking the coefficients If so, linear regression of the two distributions can tell you if there is a linear relationship or not. Linear Regression Using Scipy Documentation

0


source







All Articles