How to cast array data from object to float64 after reading an Excel file with Pandas/NumPy?

128 Views Asked by At

I am trying to import a number of Excel files in a for loop and cast a column from the file as an array of type float64, to be used later in an lmfit function. To do so, I read the Excel files (iterated by an index), put data from one column into a list, and then cast the list as an array. Shown below:

isolated_peak_df = pd.read_excel(path + r'/Residuals_{}.xlsx'.format(i), header=None)
isolated_peak = [a for a in isolated_peak_df.transpose().iloc[0].loc[0:50]]
isolated_340_peak = np.array(isolated_340_peak)

This works for other purposes, but when I attempt to use the newly-created array to do some math in an lmfit function, I get the error:

TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'

I've seen a number of questions (e.g., here, here, here) asking for workarounds to the same error, but none of them resolve this specific situation, and trying to use advice in the answers continues to lead to errors. For example,

  • I tried to add dtype='float64' as an argument to the np.array() call, like how Adrine Correya suggested here, but the same error remained.

  • I tried adding isolated_peak_df = isolated_peak_df.astype('float') the line after I defined isolated_peak_df, and I also tried adding pd.to_numeric(isolated_peak) in both the line after I defined isolated_peak as a list and in the line after I defined isolated_peak as an array, as Nick suggested in the comments below. However, in all cases, the same error remained.

  • Additionally, as MHO suggested here, I made sure that the sizes of the array shown above matches the size of the array with which mathematical operations are being performed (and I also did a basic subtraction by making an np.zeros() array with the same size to verify that the issue was with the newly created array above, and not the latter).

  • And, as xagg suggested here and hpaulj advised in the comments below, I made sure that all of the values passed were indeed floating-point numbers, not strings or other non-numeric data.

Other answers provided were specific to scipy/sci-kit and not lmfit. What is going wrong?


Edit: I now suspect that the error is less to do with the data type of the array, as the error would imply, but in the size of the parameters I am using in the lmfit function. By decreasing the starting value of the "x" and "y" parameters by several orders of magnitude (e.g., down to 5), the error no longer appears...

freeParams = Parameters()
freeParams.add("x", value = 5 * (10 ** 20), vary=True)
freeParams.add("y", value = 5 * (10 ** 15), vary=True)

To my knowledge, this shouldn't cause an issue because float64 can store decimal numbers ranging between 2.2E-308 to 1.7E+308. So I'm not sure why 5e15 or 5e20 would induce an error. The rest of the code is shown below:

epsfcn=0.01
ftol=1.e-10
xtol=1.e-10
max_nfev=300

for i in absorption.index:
    isolated_peak_df = pd.read_excel(path + r'/Residuals_{}.xlsx'.format(i), header=None)
    isolated_peak = [a for a in isolated_peak_df.transpose().iloc[0].loc[0:50]]
    isolated_peak = np.array(isolated_peak)
    
    def calc_residual(freeParams, isolated_peak):
        residual = isolated_peak[5:22] - np.zeros(17)
        return residual

    mini = minimize(calc_residual, freeParams, args=(isolated_peak,), epsfcn=epsfcn, ftol=ftol, xtol=xtol, max_nfev=max_nfev, calc_covar=True, nan_policy="omit")

Note that "x" and "y" aren't used in the above code only because I wanted to test out what was going wrong, and so I chose to do a simple subtraction operation that should have given a result (in residual) equal to the originally imported array (isolated_peak) -- ultimately, however, I will need to incorporate both parameters, "x" and "y."


Edit: Full error message, per hpaulj's request:

TypeError                                 Traceback (most recent call last)
Cell In[202], line 19
     16     residual = isolated_peak[5:22] - np.zeros(17)
     17     return residual
---> 19 mini = minimize(calc_residual, freeParams, args=(isolated_peak,), epsfcn=epsfcn, ftol=ftol, xtol=xtol, max_nfev=max_nfev, calc_covar=True, nan_policy="omit")

File ~/opt/anaconda3/lib/python3.9/site-packages/lmfit/minimizer.py:2600, in minimize(fcn, params, method, args, kws, iter_cb, scale_covar, nan_policy, reduce_fcn, calc_covar, max_nfev, **fit_kws)
   2460 """Perform the minimization of the objective function.
   2461 
   2462 The minimize function takes an objective function to be minimized,
   (...)
   2594 
   2595 """
   2596 fitter = Minimizer(fcn, params, fcn_args=args, fcn_kws=kws,
   2597                    iter_cb=iter_cb, scale_covar=scale_covar,
   2598                    nan_policy=nan_policy, reduce_fcn=reduce_fcn,
   2599                    calc_covar=calc_covar, max_nfev=max_nfev, **fit_kws)
-> 2600 return fitter.minimize(method=method)

File ~/opt/anaconda3/lib/python3.9/site-packages/lmfit/minimizer.py:2369, in Minimizer.minimize(self, method, params, **kws)
   2366         if (key.lower().startswith(user_method) or
   2367                 val.lower().startswith(user_method)):
   2368             kwargs['method'] = val
-> 2369 return function(**kwargs)

File ~/opt/anaconda3/lib/python3.9/site-packages/lmfit/minimizer.py:1693, in Minimizer.leastsq(self, params, max_nfev, **kws)
   1691 result.call_kws = lskws
   1692 try:
-> 1693     lsout = scipy_leastsq(self.__residual, variables, **lskws)
   1694 except AbortFitException:
   1695     pass

File ~/opt/anaconda3/lib/python3.9/site-packages/scipy/optimize/_minpack_py.py:426, in leastsq(func, x0, args, Dfun, full_output, col_deriv, ftol, xtol, gtol, maxfev, epsfcn, factor, diag)
    424     if maxfev == 0:
    425         maxfev = 200*(n + 1)
--> 426     retval = _minpack._lmdif(func, x0, args, full_output, ftol, xtol,
    427                              gtol, maxfev, epsfcn, factor, diag)
    428 else:
    429     if col_deriv:

TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'
1

There are 1 best solutions below

0
On

With the example code from scipy.leastsq:

In [122]: from scipy.optimize import leastsq
     ...: def func(x):
     ...:     return 2*(x-3)**2+1
     ...: leastsq(func, 0)
Out[122]: (array([2.99999999]), 1)

providing a numeric array:

In [124]: leastsq(func, np.array([1,2,3]))
Out[124]: (array([1., 2., 3.]), 2)

But if I make it an object dtype array, I get an error that's a lot like your lmfit one:

In [125]: leastsq(func, np.array([1,2,3]).astype(object))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[125], line 1
----> 1 leastsq(func, np.array([1,2,3]).astype(object))

File ~\miniconda3\lib\site-packages\scipy\optimize\_minpack_py.py:426, in leastsq(func, x0, args, Dfun, full_output, col_deriv, ftol, xtol, gtol, maxfev, epsfcn, factor, diag)
    424     if maxfev == 0:
    425         maxfev = 200*(n + 1)
--> 426     retval = _minpack._lmdif(func, x0, args, full_output, ftol, xtol,
    427                              gtol, maxfev, epsfcn, factor, diag)
    428 else:
    429     if col_deriv:

TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'

The print of an object dtype array may look the same as the numeric dtype, but the fuller repr display shows the dtype

In [127]: print(np.array([1,2,3]).astype(object))
[1 2 3]

In [128]: np.array([1,2,3]).astype(object)
Out[128]: array([1, 2, 3], dtype=object)