Anomalous Parameter Estimation Following Gamma Distribution Fitting with scipy.stats

Question

Anomalous Parameter Estimation Following Gamma Distribution Fitting with scipy.stats

40 Views Asked by Alok At 17 August 2025 at 02:39

I have two sets of datasets (https://github.com/badal01/precipi_data/blob/main/Data1.xlsx). The range of the two datasets is more or less similar. The alpha value for the first dataset (first row in Data1.xlsx) is 5.002. However, the alpha value for the second row is 396111157.0657891, which is abnormal. I attempted to check if the data contains any absurd values, very high values, or negative values. However, the data looks completely normal and very similar to the first row.

import scipy.stats as stats
import pandas as pd
io = pd.read_excel('Data1.xlsx',header=None).to_numpy()
fit_alpha, fit_loc, fit_beta=stats.gamma.fit(io[0,:])
fit_alpha1, fit_loc1, fit_beta1=stats.gamma.fit(io[1,:])

I tried it with MATLAB, and it is working quite well. However, I do not understand what's happening in Python. Any help would be appreciated.

I tried with different library. I did not observe any different results.

Original Q&A

There are 1 best solutions below

**Matt Haberland** · Answer 1

Here is complete code to go along with my comment above. (In the future, please include any data in plain text.)

Here's the fit you were happy with:

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

x1 = np.asarray([4.43662e-05,2.43413e-05,2.73413e-05,2.9961e-05,4.06934e-05,2.56346e-05,4.02114e-05,1.94386e-05,2.18386e-05,3.1264e-05,3.31055e-05,3.00496e-05,3.00685e-05,3.31455e-05,3.98705e-05,6.30289e-05,2.39244e-05,5.56489e-05,1.90789e-05,2.67739e-05,3.46394e-05,3.67507e-05,3.12904e-05,3.55849e-05,3.91662e-05,7.19415e-05,2.21005e-05,2.9677e-05,3.56736e-05,3.29453e-05,4.79659e-05,4.09342e-05,2.72911e-05,3.61119e-05,5.24038e-05,3.84542e-05,4.64852e-05,1.71126e-05,4.47858e-05,5.89909e-05,2.81931e-05,4.28806e-05,4.32288e-05,3.52785e-05,5.44103e-05,2.42965e-05,1.87145e-05,5.13848e-05,2.6523e-05,3.37923e-05,2.43713e-05,3.43567e-05,3.67469e-05,3.86891e-05,3.59647e-05,3.80963e-05,2.43904e-05,3.79046e-05,2.07185e-05,3.1492e-05,2.84438e-05,5.70566e-05,3.68865e-05,3.39524e-05,3.34151e-05,4.95275e-05,4.34074e-05,2.86148e-05,3.05382e-05,2.9697e-05,2.44529e-05,4.17229e-05,5.39806e-05,6.02317e-05,3.82269e-05,3.63218e-05,4.14949e-05,3.10743e-05,2.43437e-05,3.16157e-05,4.81429e-05,2.66055e-05,3.00394e-05,4.37599e-05,4.01972e-05,4.15613e-05])
x2 = np.asarray([5.48193e-05,4.79855e-05,7.32296e-05,6.02249e-05,5.51259e-05,4.70833e-05,4.3032e-05,5.08287e-05,4.90293e-05,5.65956e-05,2.10931e-05,5.3334e-05,4.45536e-05,6.39792e-05,6.37097e-05,6.71664e-05,6.67199e-05,6.53865e-05,4.05624e-05,2.86803e-05,5.09496e-05,5.64781e-05,5.00183e-05,5.86148e-05,5.84764e-05,4.27342e-05,8.01059e-05,3.5541e-05,7.83815e-05,7.64419e-05,6.71847e-05,4.83841e-05,5.31507e-05,5.55475e-05,4.5599e-05,8.18473e-05,7.35816e-05,5.1769e-05,5.91082e-05,6.43396e-05,4.32755e-05,7.08638e-05,6.88461e-05,6.31059e-05,4.04879e-05,6.10569e-05,6.51778e-05,3.58981e-05,7.56359e-05,7.48164e-05,5.36863e-05,5.94128e-05,3.60576e-05,4.93196e-05,4.86909e-05,4.92113e-05,2.95355e-05,4.34835e-05,5.83857e-05,7.798e-05,3.49355e-05,5.67644e-05,5.92727e-05,3.66655e-05,4.39566e-05,4.53161e-05,5.67604e-05,3.63818e-05,5.8882e-05,3.98616e-05,4.59374e-05,5.21106e-05,5.67607e-05,5.92866e-05,5.00237e-05,6.78354e-05,7.96478e-05,3.85134e-05,6.38701e-05,5.60584e-05,7.24924e-05,5.33234e-05,4.41117e-05,4.6092e-05,6.4302e-05,5.448e-05])

ptp= np.ptp(x1)
x = np.linspace(x1.min() - ptp/10, x1.max() + ptp/10, 300)

params1 = stats.gamma.fit(x1)
dist1 = stats.gamma(*params1)
plt.hist(x1, density=True)
plt.plot(x, dist1.pdf(x))

params1  # (5.005034504766963, 1.1531618825829435e-05, 4.9139272044517935e-06)

For the other data, you get an unexpected large shape parameter.

ptp= np.ptp(x2)
x = np.linspace(x2.min() - ptp/10, x2.max() + ptp/10, 300)

params2 = stats.gamma.fit(x2)
dist2 = stats.gamma(*params2)
plt.hist(x2, density=True)
plt.plot(x, dist2.pdf(x))
params2  # (398115981.5241561, -0.25897428431767466, 6.506374646288466e-10)

Keep in mind that fit provides the unconstrained maximum likelihood estimate of the three parameter gamma distribution; it does not consider any other constraints or desires you might have if you don't specify them.

If you are happy with a two-parameter gamma distribution, you can fix the location to zero with floc=0.

params2 = stats.gamma.fit(x2, floc=0)
dist2 = stats.gamma(*params2)
plt.hist(x2, density=True)
plt.plot(x, dist2.pdf(x))
params2  # (16.548028224665476, 0, 3.3166055685313535e-06)

If you want to put bounds on the fitted parameters, try scipy.stats.fit.

bounds = {'a': (1e-3, 100), 'loc': (0, 0), 'scale': (1e-12, 1)}
res = stats.fit(stats.gamma, x2, bounds=bounds)
print(res)
#  params: FitParams(a=13.076023225160911, loc=0.0, scale=4.194486773039213e-06)
# success: True
# message: 'Optimization terminated successfully.'
res.plot(plot_type='cdf')

That said, you might also consider whether your data is more consistent with a different distribution.

Anomalous Parameter Estimation Following Gamma Distribution Fitting with scipy.stats

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in NUMPY

Related Questions in SCIPY

Related Questions in SCIPY.STATS

Related Questions in GAMMA-DISTRIBUTION

Trending Questions

Popular # Hahtags

Popular Questions