I'm trying to match the generalized extreme value (GEV) distribution's probability density function (pdf) to the data' pdf. This histogram is function of bin. As adjust this bin, the result of the function fitting also changes. And curve_fit(func, x, y)
is playing this role properly. but this function uses a "least squares estimation". What I want is to use maximum likelihood estimation (MLE). And it has good results with the stats.genextreme.fit(data)
function. However, this function does not represent histogram shape changes according to bin. Just use the original data.
I'm trying to use MLE. And I succeeded in estimating the parameters of the standard normal distribution using MLE. However, it is based on the original data and does not change according to the bin. Even the parameters of the GEV could not be estimated with the original data.
I checked the source code of genextreme_gen
, rv_continuous
, etc. But, this code is too complicated. I couldn't accept the source code with my Python skills.
I would like to estimate the parameters of the GEV distribution through MLE. And I want to get the result that the estimate changes according to bin.
What should I do?
I am sorry for my poor English, and thank you for your help.
+)
h = 0.5 # bin width
dat = h105[1] # data
b = np.arange(min(dat)-h/2, max(dat), h) # bin range
n, bins = np.histogram(dat, bins=b, density=True) # histogram
x = 0.5*(bins[1:]+bins[:-1]) # x-value of histogram
popt,_ = curve_fit(fg, x, n) # curve_fit(GEV's pdf, x-value of histogram, pdf value)
popt = -popt[0], popt[1], popt[2] # estimated paramter (Least squares estimation, LSE)
x1 = np.linspace((popt[1]-popt[2])/popt[0], dat.max(), 1000)
a1 = stats.genextreme.pdf(x1, *popt) # pdf
popt = stats.genextreme.fit(dat) # estimated parameter (Maximum likelihood estimation, MLE)
x2 = np.linspace((popt[1]-popt[2])/popt[0], dat.max(), 1000)
a2 = stats.genextreme.pdf(x2, *popt)
bin width = 2
bin width = 0.5
One way to do this is to convert bins to data. You can do so by counting number of data points in each bin and then repeating center of the bin this number of times.
I have also tried to sample uniform values from each bin, but using center of the bin and then repeating it seems to provide parameters with higher likelihood.
This plot shows PDFs that are estimated using different methods along with ground truth PDF
And the next plot shows of likelihoods of estimated parameters given original data.
PDF that is estimated by MLE on original data has the maximum value as expected. Then follow PDFs that are estimated using histogram bin (centered and uniform). After them there is ground truth PDF. And finally comes PDF with the lowest likelihood, which is estimated using least squares.