I have to use the curve_fit numpy function over a large set of data (5 000 000). So basically I've created a 2D array. First dimension is the number of fittings to perform, second dimension is the number of points used for the fitting.
t = np.array([0 1 2 3 4])
for d in np.ndindex(data.shape[0]):
try:
popt, pcov = curve_fit(func, t, np.squeeze(data[d,:]), p0=[1000,100])
except RuntimeError:
print("Error - curve_fit failed")
multiprocessing can be used to speed up the full process, but it is still quite slow. Is there a way to use curve_fit in a "vectorized" manner?
Curve fit extends the functionality of
scipy.optimize.leastsq
which is itself a wrapper for the underlying MINPACKlmdif
andlmder
fortran routines. It looks like multi-threading is not possible, check out this link, which says,There is still an open ticket to develop this but it looks like it can not be finished... You would either need to use a different library or write a wrapper/function in a lower level code. There are papers on implementations of parallel Levenberg-Marquardt algorithms.
Maybe there is another solution, use less data or as a rough estimate, you could randomly split your data into parts, curve fit each of the parts on a separate thread (with multi-processor) and take an average of the coefficients at the end.