Subtracing float from a datetime in Pandas (numpy)

5.2k Views Asked by At

How do I subtract the float values in vectorized form from a datetime64 array?

Data:

import numpy as np
import pandas as pd

some_dates = np.array(['2007-07-13', '2006-01-13', '2010-08-13'], dtype='datetime64')
some_ints = np.array([1 ,2 ,3], dtype = 'int64')
some_float = np.array([1.00 ,2.00 ,3.00], dtype = 'float64')

data_dict = {'dates':some_dates, 
             'ints':some_ints, 
             'floats':some_float}

test_data = pd.DataFrame(data_dict)

Looks like this:

Out[1]: 
       dates  floats  ints
0 2007-07-13       1     1
1 2006-01-13       2     2
2 2010-08-13       3     3

What I want to do:

#===============================================================================
# Works well
#===============================================================================
test_data['dates'] = test_data['dates'].sub(test_data['ints'])

But with NaN values in a vector. Nan in int vectors are not supported, thus they are automatically converted to float:

#------------------------------------------------------------------------------ 
# Converts ints to floats 

test_data.dtypes

> Out[2]: 
> dates     datetime64[ns]
> floats           float64
> ints               int64
> dtype: object

test_data.loc[2:2, 'ints'] = None

> Out[3]: 
> dates     datetime64[ns]
> floats           float64
> ints             float64
> dtype: object

>  Out[4]: 
>        dates  floats  ints
> 0 2007-07-13       1     1
> 1 2006-01-13       2     2
> 2 2010-08-13       3   NaN

But then I cannot subtract floats from my dates:

#----------------------------------------------------------------------------- #
# But this way also doesn't work
test_data['dates'] = test_data['dates'].sub(test_data['floats'])

> TypeError: ufunc subtract cannot use operands with types dtype('<M8[ns]') and dtype('float64')

I have found workaround which is extremely slow due to the "in python" apply:

# from dateutil.relativedelta import relativedelta
def sub_float(df_row):
    if pd.notnull(df_row['floats']):
#         df_row['dates'] = df_row['dates'] - relativedelta(days = df_row['floats'])
        df_row['dates'] = df_row['dates'] - pd.DateOffset(days=df_row['floats'])
    return(df_row['dates'])
test_data['dates'] = test_data.apply(sub_float, 1)

Are there any suggestions how I can subtract floats from a datetime in vectorized way?

1

There are 1 best solutions below

3
On BEST ANSWER

Changes the floats to time_deltas (which are able to handle NaNs)

In [22]: df
Out[22]:
       dates  floats  ints
0 2007-07-13     NaN     1
1 2006-01-13       2     2
2 2010-08-13       3     3

In [23]: df.dates - pd.to_timedelta(df.floats.astype(str), unit='D')
Out[23]:
0          NaT
1   2006-01-11
2   2010-08-10
dtype: datetime64[ns]