How do I subtract the float values in vectorized form from a datetime64 array?
Data:
import numpy as np
import pandas as pd
some_dates = np.array(['2007-07-13', '2006-01-13', '2010-08-13'], dtype='datetime64')
some_ints = np.array([1 ,2 ,3], dtype = 'int64')
some_float = np.array([1.00 ,2.00 ,3.00], dtype = 'float64')
data_dict = {'dates':some_dates,
'ints':some_ints,
'floats':some_float}
test_data = pd.DataFrame(data_dict)
Looks like this:
Out[1]:
dates floats ints
0 2007-07-13 1 1
1 2006-01-13 2 2
2 2010-08-13 3 3
What I want to do:
#===============================================================================
# Works well
#===============================================================================
test_data['dates'] = test_data['dates'].sub(test_data['ints'])
But with NaN values in a vector. Nan in int vectors are not supported, thus they are automatically converted to float:
#------------------------------------------------------------------------------
# Converts ints to floats
test_data.dtypes
> Out[2]:
> dates datetime64[ns]
> floats float64
> ints int64
> dtype: object
test_data.loc[2:2, 'ints'] = None
> Out[3]:
> dates datetime64[ns]
> floats float64
> ints float64
> dtype: object
> Out[4]:
> dates floats ints
> 0 2007-07-13 1 1
> 1 2006-01-13 2 2
> 2 2010-08-13 3 NaN
But then I cannot subtract floats from my dates:
#----------------------------------------------------------------------------- #
# But this way also doesn't work
test_data['dates'] = test_data['dates'].sub(test_data['floats'])
> TypeError: ufunc subtract cannot use operands with types dtype('<M8[ns]') and dtype('float64')
I have found workaround which is extremely slow due to the "in python" apply:
# from dateutil.relativedelta import relativedelta
def sub_float(df_row):
if pd.notnull(df_row['floats']):
# df_row['dates'] = df_row['dates'] - relativedelta(days = df_row['floats'])
df_row['dates'] = df_row['dates'] - pd.DateOffset(days=df_row['floats'])
return(df_row['dates'])
test_data['dates'] = test_data.apply(sub_float, 1)
Are there any suggestions how I can subtract floats from a datetime in vectorized way?
Changes the floats to time_deltas (which are able to handle NaNs)