How to determine whether a fit is reasonable in Python

569 Views Asked by At

I am fitting a function to data in Python using lmfit. I want to tell whether the fit is good or not. Consider this example (which is actually my data):

enter image description here

Most humans will agree in that the fit in the plot is reasonable. On the other hand, the 'bad fit example' shows a case in which most humans will agree in that this fit is not good. As a human, I am capable of performing such 'statistical eye test' to tell whether the fit is good looking at the plot.

Now I want to automate this process, because I have a lot of data sets and fits and simply cannot look at each of them individually. I am using a chi squared test in the following way:

result = model.fit(y_values, params, x=x_values) # `model` was previously created using lmfit.
degrees_of_freedom = result.nfree
significance_alpha = .05
print('Is fit good?', scipy.stats.chi2.ppf(1-significance_alpha, degrees_of_freedom)>result.chisqr)

No matter what significance_alpha do I choose, it is rejecting all the fits, even though the fits are 'not that bad'. For example setting significance_alpha=1e-10 rejected the fit from the plot above, which actually it looks 'reasonably good' to me and I don't want to reject it.

So my specific question is: What am I doing wrong? Or, what other kind of test or procedure is usually done to filter between 'decent fits' and 'bad fits'?

2

There are 2 best solutions below

0
M Newville On

lmfit provides a number of statistics to evaluate the goodness of a fit. The following statistics will be printed in the fit report, as from

result = model.fit(y_values, params, x=x_values) 
print(result.fit_report()

Each of these statistics can be used to compare two fits. Reduced chi-square, and the two information criteria are both meant to be able to compare fits with a different number of variables (or degrees of freedom). All of them except R-squared should decrease for a better fit, while R-squared should increase.

In order to change that interpretation of a statistic from meaning "the better of two or more fits" into meaning "an objectively good fit", one normally needs to make sure that the fit residual at the heart of chi-square is properly scaled to a good estimate of the uncertainties in the data, usually replacing

residual = data - fit

with

residual = (data - fit)/epsilon

where epsilon is the 1-sigma uncertainty in the data. If this is the case, then a "good fit" should have chi-square of approximately N_free, so that reduced chi-square is approximately 1.

Note that of these statistics, R-squared is by far the worst to use, as that data-data.mean() used explicitly means that the mis-fit is not compared to the uncertainties in the data (as it should be) but to the variation of data itself. As an example, if you have data that has a step function in it or data with a single but prominent peak in it (as your data does), most data points may be far from the mean value, and so that denominator will be rather large, making R-squared appear very close to 1.

0
user171780 On

I found the problem, I was giving a zero uncertainty to bins with zero entries. Adding an uncertainty of 1 to these bins makes the chi squared test to work properly.