How to estimate maximum likelihood with GEV in python

1.4k Views Asked by At

I'm trying to match the generalized extreme value (GEV) distribution's probability density function (pdf) to the data' pdf. This histogram is function of bin. As adjust this bin, the result of the function fitting also changes. And curve_fit(func, x, y) is playing this role properly. but this function uses a "least squares estimation". What I want is to use maximum likelihood estimation (MLE). And it has good results with the stats.genextreme.fit(data)function. However, this function does not represent histogram shape changes according to bin. Just use the original data.

I'm trying to use MLE. And I succeeded in estimating the parameters of the standard normal distribution using MLE. However, it is based on the original data and does not change according to the bin. Even the parameters of the GEV could not be estimated with the original data.

I checked the source code of genextreme_gen, rv_continuous, etc. But, this code is too complicated. I couldn't accept the source code with my Python skills.

I would like to estimate the parameters of the GEV distribution through MLE. And I want to get the result that the estimate changes according to bin.

What should I do?

I am sorry for my poor English, and thank you for your help.

+)

h = 0.5  # bin width
dat = h105[1]  # data
b = np.arange(min(dat)-h/2, max(dat), h)  # bin range
n, bins = np.histogram(dat, bins=b, density=True)  # histogram
x = 0.5*(bins[1:]+bins[:-1])  # x-value of histogram

popt,_ = curve_fit(fg, x, n)  # curve_fit(GEV's pdf, x-value of histogram, pdf value)
popt = -popt[0], popt[1], popt[2]  # estimated paramter (Least squares estimation, LSE)
x1 = np.linspace((popt[1]-popt[2])/popt[0], dat.max(), 1000)
a1 = stats.genextreme.pdf(x1, *popt)  # pdf

popt = stats.genextreme.fit(dat) # estimated parameter (Maximum likelihood estimation, MLE)
x2 = np.linspace((popt[1]-popt[2])/popt[0], dat.max(), 1000)
a2 = stats.genextreme.pdf(x2, *popt)

bin width = 2

enter image description here

bin width = 0.5

enter image description here

1

There are 1 best solutions below

2
On BEST ANSWER

One way to do this is to convert bins to data. You can do so by counting number of data points in each bin and then repeating center of the bin this number of times.

I have also tried to sample uniform values from each bin, but using center of the bin and then repeating it seems to provide parameters with higher likelihood.

import scipy.stats as stats
from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt

ground_truth_params = (0.001, 0.5, 0.999)
count = 50

h = 0.2  # bin width
dat = stats.genextreme.rvs(*ground_truth_params, count)  # data
b = np.arange(np.min(dat)-h/2, np.max(dat), h)  # bin range
n, bins = np.histogram(dat, bins=b, density=True)  # histogram
bin_counts, _ = np.histogram(dat, bins=b, density=False)  # histogram
x = 0.5*(bins[1:]+bins[:-1])  # x-value of histogram
    
def flatten(l):
    return [item for sublist in l for item in sublist]

popt,_ = curve_fit(stats.genextreme.pdf, x, n, p0=[0,1,1])  # curve_fit(GEV's pdf, x-value of histogram, pdf value)
popt_lse = -popt[0], popt[1], popt[2]  # estimated paramter (Least squares estimation, LSE)
popt_mle = stats.genextreme.fit(dat) # estimated parameter (Maximum likelihood estimation, MLE)

uniform_dat_from_bins = flatten((np.linspace(x - h/2, x + h/2, n) for n, x in zip(bin_counts, x)))
popt_uniform_mle = stats.genextreme.fit(uniform_dat_from_bins) # estimated parameter (Maximum likelihood estimation, MLE)

centered_dat_from_bins = flatten(([x] * n for n, x in zip(bin_counts, x)))
popt_centered_mle = stats.genextreme.fit(centered_dat_from_bins) # estimated parameter (Maximum likelihood estimation, MLE)

plot_params = {
    ground_truth_params: 'tab:green',
    popt_lse: 'tab:red',
    popt_mle: 'tab:orange',
    popt_centered_mle: 'tab:blue',
    popt_uniform_mle: 'tab:purple'
}
param_names = ['GT', 'LSE', 'MLE', 'bin centered MLE', 'bin uniform MLE']

plt.figure(figsize=(10,5))

plt.bar(x, n, width=h, color='lightgrey')
plt.ylim(0, 0.5)
plt.xlim(-2,10)

for params, color in plot_params.items():
    x_pdf = np.linspace(-2, 10, 1000)
    y_pdf = stats.genextreme.pdf(x_pdf, *params) # the normal pdf
    plt.plot(x_pdf, y_pdf, label='pdf', color=color)
plt.legend(param_names)

plt.figure(figsize=(10,5))
for params, color in plot_params.items():
    plt.plot(np.sum(stats.genextreme.logpdf(dat, *params)), 'o', color=color)

This plot shows PDFs that are estimated using different methods along with ground truth PDF

estimated PDFs

And the next plot shows of likelihoods of estimated parameters given original data.

PDF that is estimated by MLE on original data has the maximum value as expected. Then follow PDFs that are estimated using histogram bin (centered and uniform). After them there is ground truth PDF. And finally comes PDF with the lowest likelihood, which is estimated using least squares.

likelihoods