Matlab: Chi-squared fit (chi2gof) to check if data is exponentially distributed

I think this is a simple question, but I cannot figure it out. I have a vector whose first elements look like this:

V = [31 52 38 29 29 34 29 24 25 25 32 28 24 28 29 ...];

      

and I want to do a test chi2gof

in Matlab to check the exponential distribution V

. I did:

[h,p] = chi2gof(V,'cdf',@expcdf);

      

but I get a warning message:

Warning: After pooling, some bins still have low expected counts.
The chi-square approximation may not be accurate

      

Have I misidentified the call chi2gof

?

+3


source to share


1 answer


At 36, you have a very small set of samples. From the second sentence of the Wikipedia article on the chi-squared test (emphasis mine):

It is suitable for unpaired data from large samples.

The large in this case is usually around 100. Read more of the assumptions of this test here .


Alternatives

You can try kstest

in Matlab which is based on Kolmogorov-Smirnov test :

[h,p] = kstest(V,'cdf',[V(:) expcdf(V(:),expfit(V))])

      

Or try lillietest

, which is based on Lilliefors test and has an option specifically for exponential distributed data:



[h,p] = lillietest(V,'Distribution','exp')

      

In case you can increase the sample size, you are doing something wrong with chi2gof

. From parameter help

to parameter 'cdf'

:

Fully specified cumulative distribution function. This can be a ProbabilityDistribution object, a handle function, or a function. name. The function must accept X as its only argument. Alternatively, you can provide a cell array whose first element is a function name or handle and whose subsequent elements are parameter values, one per cell. The function must take X values ​​as the first argument and other parameters as later arguments.

You don't supply any additional parameters, so it expcdf

uses the default middle parameter mu = 1

. Your data values ​​are very large and don't follow an exponential distribution at all with this mean. You should also evaluate the parameters. You are a expfit

function that cheats on maximum probabilistic expectation , you can try something like this:

[h,p] = chi2gof(V,'cdf',@(x)expcdf(x,expfit(x)),'nparams',1)

      

However, with only 36 samples, you cannot get a very good estimate for such a distribution, and you still cannot get the expected results even for data taken from a known distribution, for example:

V = exprnd(10,1,36);
[h,p] = chi2gof(V,'cdf',@(x)expcdf(x,expfit(x)),'nparams',1)

      

+2


source







All Articles