Matlab: Chi-squared fit (chi2gof) to check if data is exponentially distributed
I think this is a simple question, but I cannot figure it out. I have a vector whose first elements look like this:
V = [31 52 38 29 29 34 29 24 25 25 32 28 24 28 29 ...];
and I want to do a test chi2gof
in Matlab to check the exponential distribution V
. I did:
[h,p] = chi2gof(V,'cdf',@expcdf);
but I get a warning message:
Warning: After pooling, some bins still have low expected counts.
The chi-square approximation may not be accurate
Have I misidentified the call chi2gof
?
source to share
At 36, you have a very small set of samples. From the second sentence of the Wikipedia article on the chi-squared test (emphasis mine):
It is suitable for unpaired data from large samples.
The large in this case is usually around 100. Read more of the assumptions of this test here .
Alternatives
You can try kstest
in Matlab which is based on Kolmogorov-Smirnov test :
[h,p] = kstest(V,'cdf',[V(:) expcdf(V(:),expfit(V))])
Or try lillietest
, which is based on Lilliefors test and has an option specifically for exponential distributed data:
[h,p] = lillietest(V,'Distribution','exp')
In case you can increase the sample size, you are doing something wrong with chi2gof
. From parameter help
to parameter 'cdf'
:
Fully specified cumulative distribution function. This can be a ProbabilityDistribution object, a handle function, or a function. name. The function must accept X as its only argument. Alternatively, you can provide a cell array whose first element is a function name or handle and whose subsequent elements are parameter values, one per cell. The function must take X values ββas the first argument and other parameters as later arguments.
You don't supply any additional parameters, so it expcdf
uses the default middle parameter mu = 1
. Your data values ββare very large and don't follow an exponential distribution at all with this mean. You should also evaluate the parameters. You are a expfit
function that cheats on maximum probabilistic expectation , you can try something like this:
[h,p] = chi2gof(V,'cdf',@(x)expcdf(x,expfit(x)),'nparams',1)
However, with only 36 samples, you cannot get a very good estimate for such a distribution, and you still cannot get the expected results even for data taken from a known distribution, for example:
V = exprnd(10,1,36);
[h,p] = chi2gof(V,'cdf',@(x)expcdf(x,expfit(x)),'nparams',1)
source to share