I have data where I want to fit the Fourier3 series, I looked to this answer: here and tried different algorithms from different packages (like symfit, and scipy). But when I plot the data, different packages give me get this result: enter image description here
Currently, I'm using the curve_fit package from scipy and here is my code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import pandas as pd
def fourier(x, *as_bs):
sum_a = 0
sum_b = 0
j = 1
w = as_bs[0]
a0 = as_bs[1]
for i in range(2, len(as_bs)-1, 2):
sum_a += as_bs[i] * np.cos(j * w * x)
sum_b += as_bs[i+1] * np.sin(j * w * x)
j = j + 1
return a0 + sum_a + sum_b
T = pd.read_excel('FS_data.xlsx')
A = pd.DataFrame(T)
xdata = np.array(A.iloc[:, 0])
ydata = np.array(A.iloc[:, 1])
# fits
popt, pcov = curve_fit(fourier, xdata, ydata, [np.random.rand(1)] * 8)
print(popt)
data_fit = fourier(ydata, *popt)
print(data_fit)
plt.plot(ydata)
plt.plot(data_fit, label='after fitting')
plt.legend()
plt.show()
So, my code basically will read random 8 numbers and assign them as initial guesses for (f, a0, a1, b1, a2, b2, a3, b3) respectively.
I tried to fit the data on Matlab to check if the data can be fitted with the fourier3 and the results there are great: enter image description here
I printed the output on both Python and Matlab to compare and here is the results for both: Python:
w = 5.66709943e-01
a0 = 3.80499132e+01
a1 = 5.56883486e-04
b1 = -3.88408379e-04
a2 = -3.88408379e-04
b2 = 3.32951592e-04
a3 = 3.15641900e-04
b3 = 1.96414168e-04
Matlab:
a0 = 38.07 (38.07, 38.08)
a1 = 0.5352 (0.4951, 0.5753)
b1 = -0.5788 (-0.5863, -0.5714)
a2 = -0.3728 (-0.413, -0.3326)
b2 = 0.5411 (0.492, 0.5901)
a3 = 0.2357 (0.2226, 0.2488)
b3 = 0.05895 (0.02773, 0.09018)
w = 0.0003088
So as noted, only the value for a0 was correct, but the others are very far from Matlab. So why I'm getting this result in Python? What I'm doing wrong?
Here is the data for those who like to test it out:
I am not into Matlab, so I don't know, which additional work the Matlab fit does to estimate starting values for a non-linear fit. I can say, though, that
curve_fit
does non at all, i.e. all values are assumed to be on the order of1
. The easiest way, would have been to rescale thex
axis to the range[0, 2 pi]
. Hence, the problem of the OP is, once again, wrong starting values. Rescaling requires, however, the knowledge that the main wave to be fitted is approximately the width of the data set. Moreover, we need to assume that all other fit parameters are also of the order1
. Luckily, this is the case, so this would have worked:If we cannot make the assumptions above, we may only assume that there is a base frequency with a dominant contribution to the signal (Note that this is not always true). In this case we can pre-calculate starting guesses in an non-iterative way.
The solution looks a bit more complicated:
Both providing basically the same solution, with the latter code being applicable to more cases with less assumptions.