Wrong Exponential Power Plot - How to improve curve fit

399 Views Asked by At

Unfortunately, the power fit with scipy does not return a good fit. I tried to use p0 as an input argument with close values which did not help.

I would be very glad if someone could point out to me my problem.

# Imports 
from scipy.optimize import curve_fit
import numpy as np 
import matplotlib.pyplot as plt

# Data
data = [[0.004408724185371062, 78.78011887652593], [0.005507091456466967, 65.01330508350753], [0.007073553026306459, 58.13364205119446], [0.009417452253958304, 50.12258366028477], [0.01315330108197482, 44.22980301062208], [0.019648758406406834, 35.436139354228956], [0.03248060063099905, 28.359815190205957], [0.06366197723675814, 21.54769216720596], [0.17683882565766149, 14.532777174472574], [1.5915494309189533, 6.156872080264581]]

# Fill lists to store x and y value
x_data,y_data = [], []
for i in data:
    x_data.append(i[0])
    y_data.append(i[1])

# Exponential Function
def func(x,m,c):
        return x**m * c 

# Curve fit
coeff, _ = curve_fit(func, x_data, y_data)
m, c = coeff[0], coeff[1]

# Plot function
x_function = np.linspace(0, 1.5, 100) 
y = x_function**m * c 
a = plt.scatter(x_data, y_data, s=30, marker = "v")
yfunction = x_function**m * c 
plt.plot(x_function, yfunction, '-')
plt.show()

Another dataset for which the fit is really bad would be:

data = [[0.004408724185371062, 194.04075083542443], [0.005507091456466967, 146.09194314074864], [0.007073553026306459, 120.2115882821158], [0.009417452253958304, 74.04014371874908], [0.01315330108197482, 34.167114633194736], [0.019648758406406834, 12.775528348369871], [0.03248060063099905, 7.903195816871708], [0.06366197723675814, 5.186092050500438], [0.17683882565766149, 3.260540592404184], [1.5915494309189533, 2.006254812978579]]
2

There are 2 best solutions below

0
On BEST ANSWER

I might miss something but I think the curve_fit just works fine. When I compare the residuals obtained by curve_fit to the ones one would obtain using the parameters obtained by excel which you provide in the comments, the python results always lead to lower residuals (code is provided below). You say "Unfortunately the power fit with scipy does not return a good fit." but what exactly is your measure for a "good fit"? The python fit seems always be better than the excel fit with respect to the residuals.

Not sure whether it has to be exactly this function but if not, you could also consider to add a third parameter to your function (below it is named "d") which will lead to better results.

Here is the modified code. I changed your "func" and also increased the resolution for the plot. Then the residuals are printed as well. For the first data set, one obtains for excel around 79.35 and with python around 34.29. For the second data set it is 15220.79 with excel and 601.08 with python (assuming I did not mess anything up).

from scipy.optimize import curve_fit
import numpy as np 
import matplotlib.pyplot as plt

# Data
data = [[0.004408724185371062, 78.78011887652593], [0.005507091456466967, 65.01330508350753], [0.007073553026306459, 58.13364205119446], [0.009417452253958304, 50.12258366028477], [0.01315330108197482, 44.22980301062208], [0.019648758406406834, 35.436139354228956], [0.03248060063099905, 28.359815190205957], [0.06366197723675814, 21.54769216720596], [0.17683882565766149, 14.532777174472574], [1.5915494309189533, 6.156872080264581]]
#data = [[0.004408724185371062, 194.04075083542443], [0.005507091456466967, 146.09194314074864], [0.007073553026306459, 120.2115882821158], [0.009417452253958304, 74.04014371874908], [0.01315330108197482, 34.167114633194736], [0.019648758406406834, 12.775528348369871], [0.03248060063099905, 7.903195816871708], [0.06366197723675814, 5.186092050500438], [0.17683882565766149, 3.260540592404184], [1.5915494309189533, 2.006254812978579]]
# Fill lists to store x and y value
x_data,y_data = [], []
for i in data:
    x_data.append(i[0])
    y_data.append(i[1])

# Exponential Function
def func(x,m,c):
    #slightly rewritten; you could also consider using a third parameter d
    return c*np.power(x,m) #  + d

# Curve fit
coeff, _ = curve_fit(func, x_data, y_data)
m, c = coeff[0], coeff[1] #, coeff[2]
print m, c #, d

# Plot function
a = plt.scatter(x_data, y_data, s=30, marker = "v")
x_function = np.linspace(0, 1.5, 1000) 
yfunction = c*np.power(x_function,m) # + d
plt.plot(x_function, yfunction, '-')
plt.show()
print "residuals python:",((y_data - func(x_data, *coeff))**2).sum()
#compare to excel, first data set
print "residuals excel:",((y_data - func(x_data, -0.425,7.027))**2).sum()
#compare to excel, second data set
print "residuals excel:",((y_data - func(x_data, -0.841,1.0823))**2).sum()
0
On

Taking your second dataset as an example: If you plot the raw data, a difficulty with the data becomes obvious: your data are very non-uniform. Now, since your function has a pure power law form, it's easiest to do the fitting in log scale:

In [1]: import numpy as np

In [2]: import matplotlib.pyplot as plt

In [3]: plt.ion()

In [4]: data = [[0.004408724185371062, 194.04075083542443], [0.005507091456466967, 146.09194314074864], [0.007073553026306459, 120.2115882821158], [0.009417452253958304, 74.04014371874908], [0.01315330108197482, 34.167114633194736], [0.019648758406406834, 12.775528348369871], [0.03248060063099905, 7.903195816871708], [0.06366197723675814, 5.186092050500438], [0.17683882565766149, 3.260540592404184], [1.5915494309189533, 2.006254812978579]]

In [5]: data = np.asarray(data)   # just for convenience

In [6]: data.shape
Out[6]: (10, 2)

In [7]: x, y = data[:, 0], data[:, 1]

In [8]: lx, ly = np.log(x), np.log(y)

In [9]: plt.plot(lx, ly, 'ro')
Out[9]: [<matplotlib.lines.Line2D at 0x323a250>]

In [10]: def lfunc(x, a, b):
   ....:     return a*x + b
   ....: 

In [11]: from scipy.optimize import curve_fit

In [12]: opt, cov = curve_fit(lfunc, lx, ly)

In [13]: opt
Out[13]: array([-0.84071518,  0.07906558])

In [14]: plt.plot(lx, lfunc(lx, *opt), 'b-')
Out[14]: [<matplotlib.lines.Line2D at 0x3be0f90>]

Whether this is an adequate model for the data is a separate concern.