Double exponential fit in Python

149 Views Asked by At

I have problem related to curve_fit in Python. Specifically, I am trying to model double exponential data.

I am trying to model data. The fits are good by the eye, but when I look at the parameters and errors I am shocked. The errors are huge (few orders greater then the mean of the parameter). Besides that I have to provide very specific initial guesses, otherwise it is useless. Do you have any tip?

X-axis is time and Y-axis is intensity. This method works fine for the blue plot and the estimated parameters are not that huge (reasonable more or less). I am facing problem when try to fit it to orange data. Here are the dataset:

Time [-0.5 -0.4 -0.3 -0.2 -0.1 0. 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2 0.21 0.22 0.23 0.24 0.25 0.27 0.3 0.33 0.36 0.39 0.42 0.45 0.48 0.51 0.54 0.57 0.6 0.63 0.66 0.69 0.72 0.75 0.78 0.81 0.84 0.87 0.9 0.93 0.96 0.99 1.02 1.05 1.08 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2. ]

Gain (blue) [0. 0.08362 0.05064 0.0498 0.03706667 0.27215333 1.47517 1.62150333 1.72074667 1.85051667 1.81913 1.89994 1.96591667 1.98604667 2.08784333 2.07969333 2.09007 2.06054333 2.1477 2.08343 2.08434 2.02111 1.96873 1.96056333 1.89549667 1.82184333 1.70469 1.47502667 1.30257667 1.09063333 0.95496333 0.87091333 0.73064667 0.59721333 0.53186333 0.47225 0.41566667 0.36025 0.31752 0.30499667 0.26868 0.21355667 0.19169 0.17645 0.14872667 0.10706667 0.12724667 0.1013 0.08720333 0.09477333 0.09134 0.10958 0.08022667 0.09363 0.08076667 0.09006 0.09902 0.07093333 0.09918 0.10192 0.09432333 0.09079333 0.10425 0.09839667]

Loss - orange [ 0. -0.12552667 -0.15764 -0.19444 -0.26223 -0.35566333 -1.2022 -1.30988667 -1.44820667 -1.49741333 -1.66169333 -1.72277667 -1.81840667 -1.92772333 -1.97253333 -2.10094667 -2.21817667 -2.24875333 -2.43420667 -2.47216 -2.50555667 -2.62232333 -2.65796667 -2.68747333 -2.69612667 -2.71442 -2.59840333 -2.56071667 -2.44539333 -2.28093667 -2.23737 -2.09984333 -2.07465667 -2.04232667 -1.95166667 -1.93045333 -1.89963333 -1.92825667 -1.98861667 -1.93057667 -2.00662333 -1.91338667 -1.92764 -2.03428 -2.04880667 -1.98233667 -1.90731333 -2.00939667 -1.92832 -1.93799 -1.89133 -1.90807 -1.97912 -1.99034667 -1.95203333 -1.85398667 -1.86426667 -1.85729667 -1.86018667 -1.92405 -1.88595667 -1.88160667 -1.82243333 -1.87084667]

How to deal with double exp fitting?

Plots: https://ibb.co/mCs39VP
Model:a*e^(-x*b)+c*e^(-x*d)+e
Paramters (for lower fit):  [-1622.6092323    -15.038877    1713.45499496   -15.33871455
    -1.92259733]
Errors [4.73773551e+06 4.17637511e+02 4.73736168e+06 4.37257698e+02
 1.20927662e-02

Plots:

enter image description here

Here is my code for the orange data fit:

strong text

  {def exp(x, a, b, c, d, e):
       return a*np.exp(b*x)+c*np.exp(d*x)+e
    
    n=25
    y_l=y_l[n:]
    x_l=x_l[n:]
    
    par, cov=opt.curve_fit(exp,x_l, y_l, p0=[-5, -7, 30, -20, -1.9], bounds=((-np.inf, -np.inf,-np.inf,-np.inf, -np.inf ),(np.inf, 0, np.inf, 0, np.inf)), maxfev=5000)
    
    print(par)
    parerr=np.sqrt(np.diag(cov))
    print("Errors",parerr)
    mx_l=np.linspace(np.min(x_l), np.max(x_l), 100)
    my_l=exp(mx_l, par[0], par[1], par[2], par[3], par[4])
    plt.plot(mx_l, my_l, label="fit", color="blue")
    plt.show()}
2

There are 2 best solutions below

1
jlandercy On

TL;DR

You are facing at least three issues:

  • Your complete dataset does not fit your model (x <= 0);
  • Additionally models with provided parameters are not convergent (increasing exponentials), it is not a solution based on dataset you showed;
  • You need to help the solver to discriminate between both exponential effects because your model is redundant and get the solver confused when optimizing

MCVE

Lets build a MCVE to highlight challenges of your fitting problem.

import numpy as np
import matplotlib.pyplot as plt
from scipy import optimize
from sklearn.metrics import r2_score

For the model you have referenced we have the following function:

def model(x, a, b, c, d, e):
    return a * np.exp(-b * x) + c * np.exp(-d * x) + e

Which is not trivial to fit because it is redundant (there is two identical exponential effect in it). Meaning the solver will be confused when performing loss gradient descent starting from unit point.

We generate some similar data to you dataset:

p0 = np.array([1.7, 1.1, -0.4, 4.2, 1.])

Notice your parameters set does not define a convergent function because both exponential are increasing. This solution certainly cannot fit your data and therefore their uncertainties are huge.

#p0 = np.array([-1622.6092323, -15.038877, 1713.45499496, -15.33871455, -1.92259733])

We generate some synthetic dataset:

np.random.seed(12345)
x = np.linspace(-0.5, 2.0, 35)
y = model(x, *p0)
s = 0.01 * np.ones_like(y)
n = s * np.random.normal(size=y.size)
yexp = y + n

Here is the key of the optimization, we only tweak one initial parameters to be negative in order to help the solver to discriminate between the two exponentials and optimize it each against the other:

popt, pcov = optimize.curve_fit(
    model, x, yexp, sigma=s,
    p0=[1, 1, -1, 1, 1],    # Hint: discriminate both effect
    absolute_sigma=True
)    
# array([ 1.70657559,  1.04436978, -0.36944364,  4.29809242,  0.96715622])

spopt = np.sqrt(np.diag(pcov))
# array([0.01738214, 0.03964849, 0.02884002, 0.09962603, 0.01952317])

We can see, in this case, the convergence is optimal (we recover parameters with appreciable errors on it).

yhat = model(x, *popt)
score = r2_score(yexp, yhat)  # 0.9993365605205807

Fitness is also correct:

enter image description here

Other models

Increase order of one of the expontentials

Another model of interest for points x>=0, with 4 parameters only could be the following:

def model(x, a, b, c, d):
    return a * x * np.exp(-b * x) + c *(1. - np.exp(-d * x))

Which give some fitness for gain curve while having no problem to converge:

t0 = np.array([-0.5, -0.4, -0.3, -0.2, -0.1, 0., 0.06, 0.07, 0.08, 0.09, 0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.21, 0.22, 0.23, 0.24, 0.25, 0.27, 0.3, 0.33, 0.36, 0.39, 0.42, 0.45, 0.48, 0.51, 0.54, 0.57, 0.6, 0.63, 0.66, 0.69, 0.72, 0.75, 0.78, 0.81, 0.84, 0.87, 0.9, 0.93, 0.96, 0.99, 1.02, 1.05, 1.08, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.])
x1 = np.array([0., 0.08362, 0.05064, 0.0498, 0.03706667, 0.27215333, 1.47517, 1.62150333, 1.72074667, 1.85051667, 1.81913, 1.89994, 1.96591667, 1.98604667, 2.08784333, 2.07969333, 2.09007, 2.06054333, 2.1477, 2.08343, 2.08434, 2.02111, 1.96873, 1.96056333, 1.89549667, 1.82184333, 1.70469, 1.47502667, 1.30257667, 1.09063333, 0.95496333, 0.87091333, 0.73064667, 0.59721333, 0.53186333, 0.47225, 0.41566667, 0.36025, 0.31752, 0.30499667, 0.26868, 0.21355667, 0.19169, 0.17645, 0.14872667, 0.10706667, 0.12724667, 0.1013, 0.08720333, 0.09477333, 0.09134, 0.10958, 0.08022667, 0.09363, 0.08076667, 0.09006, 0.09902, 0.07093333, 0.09918, 0.10192, 0.09432333, 0.09079333, 0.10425, 0.09839667])
x2 = np.array([0., -0.12552667, -0.15764, -0.19444, -0.26223, -0.35566333, -1.2022, -1.30988667, -1.44820667, -1.49741333, -1.66169333, -1.72277667, -1.81840667, -1.92772333, -1.97253333, -2.10094667, -2.21817667, -2.24875333, -2.43420667, -2.47216, -2.50555667, -2.62232333, -2.65796667, -2.68747333, -2.69612667, -2.71442, -2.59840333, -2.56071667, -2.44539333, -2.28093667, -2.23737, -2.09984333, -2.07465667, -2.04232667, -1.95166667, -1.93045333, -1.89963333, -1.92825667, -1.98861667, -1.93057667, -2.00662333, -1.91338667, -1.92764, -2.03428, -2.04880667, -1.98233667, -1.90731333, -2.00939667, -1.92832, -1.93799, -1.89133, -1.90807, -1.97912, -1.99034667, -1.95203333, -1.85398667, -1.86426667, -1.85729667, -1.86018667, -1.92405, -1.88595667, -1.88160667, -1.82243333, -1.87084667])

q = (t0 >= 0.)
t0 = t0[q]
x1 = x1[q]
x2 = x2[q]

popt1, pcov1 = optimize.curve_fit(model, t0, x1)
# (array([ 3.85903575e+01,  6.84925174e+00,  2.42361762e-02, -9.12847012e-01]),
#  array([[0.44760132, 0.05386743, 0.01048778, 0.19496296],
#         [0.05386743, 0.00822338, 0.00224154, 0.04191002],
#         [0.01048778, 0.00224154, 0.00249973, 0.04831067],
#         [0.19496296, 0.04191002, 0.04831067, 0.9471787 ]]))
# R2 = 0.9932075193845648

popt2, pcov2 = optimize.curve_fit(model, t0, x2)
# (array([-23.77125047,   5.21925912,   1.95317437,   2.20996689]),
#  array([[ 0.90925921,  0.00741302, -0.04303857,  0.25044186],
#         [ 0.00741302,  0.14738189, -0.02429899,  0.22796133],
#         [-0.04303857, -0.02429899,  0.00861674, -0.05510877],
#         [ 0.25044186,  0.22796133, -0.05510877,  0.43753774]]))
# R2 = 0.8745479038673547

enter image description here

But clearly lack of fit for loss curve. On the other hand, it fits seamlessly because there are two orthogonal functions instead of two copies of the same function.

Sum of logistic functions

@JJacquelin proposal clearly offer better fitness for the complete dataset:

def model(x, a, b, c, p, r, q, s):
    return a + b /(1 + np.exp(p * (x - r))) + c / (1 + np.exp(q * (x - s)))

It only requires an educated guess to discriminate both effects (sum of identical function):

popt1, pcov1 = optimize.curve_fit(model, t0, x1, p0=[1, 1, 1, 1, 1, -1, 1])
# R2 = 0.9965718405681585
popt2, pcov2 = optimize.curve_fit(model, t0, x2, p0=[-1, 1, 5, 1, 1, -1, 1])
# R2 = 0.9905364976009451

enter image description here

0
JJacquelin On

The model equation made of a sum of two exponentials isn't compatible with the expected shape of curve (except if one uses partial fits i.e. piecewise)).

One can propose some combination of other kind of functions such as gaussian, logistic, etc.

For example with the sum of two logistic functions :

"Gain" curve :

enter image description here

"Loss" curve :

enter image description here