Is there any solution to better match beta data distribution than using Scipy?

Question

Is there any solution to better match beta data distribution than using Scipy?

I was trying to plug in the beta distribution for my data using python. As there scipy.stats.betaprime.fit, I tried this:

import numpy as np
import math
import scipy.stats as sts
import matplotlib.pyplot as plt

N  = 5000
nb_bin = 100
a = 12; b = 106; scale = 36; loc = -a/(b-1)*scale
y = sts.betaprime.rvs(a,b,loc,scale,N)
a_hat,b_hat,loc_hat,scale_hat = sts.betaprime.fit(y)
print('Estimated parameters: \n a=%.2f, b=%.2f, loc=%.2f, scale=%.2f'%(a_hat,b_hat,loc_hat,scale_hat))

plt.figure()
count, bins, ignored = plt.hist(y, nb_bin, normed=True)
pdf_ini = sts.betaprime.pdf(bins,a,b,loc,scale)
pdf_est  = sts.betaprime.pdf(bins,a_hat,b_hat,loc_hat,scale_hat)
plt.plot(bins,pdf_ini,'g',linewidth=2.0,label='ini');plt.grid()
plt.plot(bins,pdf_est,'y',linewidth=2.0,label='est');plt.legend();plt.show()

it shows me the result:

Estimated parameters: a = 9935.34, b = 10846.64, loc = -90.63, scale = 98.93 which is very different from the original and pdf figure enter image description here

If I give the real value of loc and scale as input to the fit function, the evaluation result is better. Anyone working on this part already and getting a better solution?

+3

distribution estimation goodness-of-fit

Fay Apr 22. 17 at 11:36

source to share