I have some duration type data (lap times) as pl.Utf8 that fails to convert using strptime, whereas regular datetimes work as expected.
Minutes (before :) and Seconds (before .) are always padded to two digits, Milliseconds are always 3 digits.
Lap times are always < 2 min.
df = pl.DataFrame({
"lap_time": ["01:14.007", "00:53.040", "01:00.123"]
})
df = df.with_columns(
[
# pl.col('release_date').str.strptime(pl.Date, fmt="%B %d, %Y"), # works
pl.col('lap_time').str.strptime(pl.Time, fmt="%M:%S.%3f").cast(pl.Duration), # fails
]
)
So I used the chrono format specifier definitions from https://docs.rs/chrono/latest/chrono/format/strftime/index.html which are used as per the polars docs of strptime
the second conversion (for lap_time) always fails, no matter whether I use .%f, .%3f, %.3f. Apparently, strptime doesn't allow creating a pl.Duration directly, so I tried with pl.Time but it fails with error:
ComputeError: strict conversion to dates failed, maybe set strict=False
but setting strict=False yields all null values for the whole Series.
Am I missing something or this some weird behavior on chrono's or python-polars part?
General case
In case you have duration that may exceed 24 hours, you can extract data (minutes, seconds and so on) from string using regex pattern. For example:
About
pl.TimeTo convert data to
pl.Time, you need to specify hours as well. When you add00hours to your time, code will work: