I've recently stumbled upon a new awesome pendulum
library for easier work with datetimes.
In pandas
, there is this handy to_datetime()
method allowing to convert series and other objects to datetimes:
raw_data['Mycol'] = pd.to_datetime(raw_data['Mycol'], format='%d%b%Y:%H:%M:%S.%f')
What would be the canonical way to create a custom to_<something>
method -
in this case to_pendulum()
method which would be able to convert Series of date strings directly to Pendulum
objects?
This may lead to Series
having various interesting capabilities like, for instance, converting a series of date strings to a series of "offsets from now" - human datetime diffs.
After looking through the API a bit, I must say I'm impressed with what they've done. Unfortunately, I don't think
Pendulum
andpandas
can work together (at least, with the current latest version -v0.21
).The most important reason is that
pandas
does not natively supportPendulum
as a datatype. All the natively supported datatypes (np.int
,np.float
andnp.datetime64
) all support vectorisation in some form. You are not going to get a shred of performance improvement using a dataframe over, say, a vanilla loop and list. If anything, callingapply
on aSeries
withPendulum
objects is going to be slower (because of all the API overheads).Another reason is that
Pendulum
is a subclass ofdatetime
-This is important, because, as mentioned above,
datetime
is a supported datatype, so pandas will attempt to coercedatetime
to pandas' native datetime format -Timestamp
. Here's an example.So, with some difficulty (involving
dtype=object
), you could loadPendulum
objects into dataframes. Here's how you'd do that -However, this is essentially useless, because calling any
pendulum
method (viaapply
) will now not only be super slow, but will also end up in the result being coerced toTimestamp
again, an exercise in futility.