curve_fit seems to overestimate error of estimated parameters

197 Views Asked by At

I have some data from the lab, which I'm fitting to the function

f(s, params) = amp * (s/sp) / (1 + s/sp + 1.041) + bg (couldn't figure out how to type set this)

I set absolute_sigma=True on curve_fit to get the absolute uncertainties of the parameters (sp, amp, bg), but the calculated error from np.sqrt(np.diag(pcov)) seems unreasonable. See the plot below. The blue dots are data. The red line is the f(s, *popt). The green line replaces the optimal sp value with sp - it's error as calculated from np.sqrt(np.diag(pcov)). The orange line is sp + the same value.

enter image description here

I would expect the +/- lines to be much tighter to the red line.

Here's a minimal example:

# Function for fitting
def scattering_rate(s, sp, amp, bg):
    return amp * (s/sp) / (1 + s/sp + (-20/19.6)**2) + bg

# data
s = np.array([0.6, 1.2, 2.3, 4.3, 8.1, 15.2, 28.5, 53.4])
y = np.array([8.6, 8.5, 8.9, 9.5, 10.6, 12.6, 15.5, 18.3])

# Fit data to saturated scattering rate
popt, pcov = curve_fit(scattering_rate, s, y, absolute_sigma=True)
print('Fit parameters', popt)
print('Fit uncertainties', np.sqrt(np.diag(pcov)))

# Construct fit from optimized parameters
fit_s = np.linspace(np.min(s), np.max(s), 100)
fit = scattering_rate(fit_s, *popt)

# Consider one error difference in sp value
fit_plus_err = saturation_power(fit_s, popt[0] + np.sqrt(np.diag(pcov))[0], popt[1], popt[2])
fit_minus_err = saturation_power(fit_s, popt[0] - np.sqrt(np.diag(pcov))[0], popt[1], popt[2])

# Plot
plt.plot(s, y, '-o', markeredgecolor='k', label='data')
plt.plot(fit_s, fit_plus_err, label='sp + err')
plt.plot(fit_s, fit_minus_err, label='sp - err')
plt.plot(fit_s, fit, '-r', label='opt sp')
plt.xlabel('s')
plt.ylabel('y')
plt.legend()
plt.grid()
plt.show()

Edit

Following @jlandercy's answer, we need the error bars of the original data which are y_err = array([0.242, 0.231, 0.282, 0.31 , 0.373]). Including that in curve_fit's sigma argument, the results look much better though still a bit distance enter image description here

1

There are 1 best solutions below

2
On BEST ANSWER

Why

When using absolute_sigma=True your are advised to feed the sigma switch as well, if not they are replaced by 1. making Chi Square loss function behaves as RSS and pcov becomes somehow meaningless if your uncertainty are not unitary.

Fix

Provides uncertainty or an estimate of it as well to get a meaningful pcov matrix.

Setting sigma to 0.15 with your data drastically improve the pcov matrix while keeping the Chi Square statistics significant:

sy = np.full(y.size, 0.15)
popt, pcov = curve_fit(scattering_rate, s, y, sigma=sy, absolute_sigma=True)
np.sqrt(pcov)

# array([[3.25479068, 2.07084837, 0.45391942],
#       [2.07084837, 1.35773667, 0.2563505 ],
#       [0.45391942, 0.2563505 , 0.09865549]]) 

enter image description here enter image description here

Update

You have two direct options to improve the error:

  • Collect more points (5 points for 3 parameters leave a Chi Square with dof=2 which is very asymmetric);
  • Reduce the uncertainty on y.

Figure below shows the impact on a synthetic dataset with equivalent parameters of yours:

enter image description here enter image description here enter image description here