Python Pandas: How do I apply a function requiring an extended class (datetime)?

611 Views Asked by At

How can I use pandas apply for a function that requires an extension of a standard class (datetime)?

Specifically, I would like to import datetime_modulo from the excellent gist at https://gist.github.com/treyhunner/6218526.

This code extends the standard datetime class to allow the modulo operation to be applied to datetime objects, e.g.

from datetime_modulo import datetime
from datetime import timedelta
d = datetime.now()
print d % timedelta(seconds=60)

Now I need to apply this modulo operation to a pandas DataFrame column/Series, e.g.

df['dates'] = pd.to_datetime(df.index.values)
df['datetime_mod'] = df['dates'].apply(lambda x: x % timedelta(minutes=15))

But pandas is not able to detect the extended datetime class (unless I am just using it wrongly):

TypeError: unsupported operand type(s) for %: 'Timestamp' and 'datetime.timedelta'

How to proceed?

3

There are 3 best solutions below

4
On BEST ANSWER

You can try, as per this suggestion, converting the operand to datetime explicitly:

from datetime_modulo import datetime
from datetime import timedelta

df = pd.DataFrame({'Time': [pd.to_datetime('now')]})

def modulo(x):
    dt = datetime(year=x.year,month=x.month,day=x.day, hour=x.hour, minute=x.minute, second=x.second)
    return dt % timedelta(seconds=60)

df['Time'] = df['Time'].apply(modulo)
3
On

You are right, you are just using it wrongly.

See the error: TypeError: unsupported operand type(s) for %: 'Timestamp' and 'datetime.timedelta'.

This error means you cannot simply write x % timedelta(minutes=15) and hope it can work. It cannot. x, which is an instance of Timestamp, doesn't know how to % a datetime.timedelta. If you want it to work, you at least need to convert x to datetime_modulo.datetime.

1
On

In general, you should try to avoid calls to apply in Pandas, as it is very slow. For example, if you're trying to find out the minutes within quarters of hours, you can use:

from datetime import timedelta
df = pd.DataFrame({'dates': pd.to_datetime(['2071-12-12 10:04:44', '2071-12-12 10:30:44'])})
>>> df.dates.dt.minute.mod(15)
0    4
1    0
Name: dates, dtype: int64