Here is an example of my .csv file:
date, time, value
20240112,085917,11
20240112,085917,22
I used to import it to DataFrame with the following way:
df = pd.read_csv(csv_file, parse_dates=[['date', 'time']]).set_index('date_time')
And I was getting the following structure:
date_time value
2023-10-02 10:00:00 11
2023-10-02 10:01:00 22
Now after updating to Pandas 2.2.0 I started to get this error:
FutureWarning: Support for nested sequences for 'parse_dates' in pd.read_csv is deprecated. Combine the desired columns with pd.to_datetime after parsing instead.
So in order to achieve the same result now I have to do:
df['datetime'] = df.date.astype(str) + ' ' + df.time.astype(str)
df['datetime'] = pd.to_datetime(df.datetime, format="%Y%m%d %H%M%S")
df = df.drop(['date', 'time'], axis=1).set_index('datetime')
Is there any way to do it in the new versions of Pandas without strings concatenations which are very slow usually?
Since parsing the date will involve strings anyway and given your time format without separator, this seems like the most reasonable option.
You could simplify your code to read the columns as string directly and to
popthe columns:NB. if your days and hours/minutes/seconds are reliably padded with zeros, you can use
df.pop('date')+df.pop('time')andformat="%Y%m%d%H%M%S".Output:
A variant with numeric operations and a timedelta:
which is actually much slower (
27.7 ms ± 4.11 ms per loopvs350 µs ± 44.5 µs per loopfor the string approach)