FutureWarning: Support for nested sequences for 'parse_dates' in pd.read_csv is deprecated. How to combine date and time columns with pd.to_datetime?

203 Views Asked by user164863 At 12 February 2024 at 18:28

Here is an example of my .csv file:

date, time, value
20240112,085917,11
20240112,085917,22

I used to import it to DataFrame with the following way:

df = pd.read_csv(csv_file, parse_dates=[['date', 'time']]).set_index('date_time')

And I was getting the following structure:

date_time             value
2023-10-02 10:00:00   11
2023-10-02 10:01:00   22

Now after updating to Pandas 2.2.0 I started to get this error:

FutureWarning: Support for nested sequences for 'parse_dates' in pd.read_csv is deprecated. Combine the desired columns with pd.to_datetime after parsing instead.

So in order to achieve the same result now I have to do:

df['datetime'] = df.date.astype(str) + ' ' + df.time.astype(str)
df['datetime'] = pd.to_datetime(df.datetime, format="%Y%m%d %H%M%S")
df = df.drop(['date', 'time'], axis=1).set_index('datetime')

Is there any way to do it in the new versions of Pandas without strings concatenations which are very slow usually?

Original Q&A

There are 1 best solutions below

mozway On 12 February 2024 at 18:45 BEST ANSWER

Since parsing the date will involve strings anyway and given your time format without separator, this seems like the most reasonable option.

You could simplify your code to read the columns as string directly and to pop the columns:

df = pd.read_csv(csv_file, sep=', *', engine='python',
                 dtype={'date': str, 'time': str})

df['datetime'] = pd.to_datetime(df.pop('date')+' '+df.pop('time'),
                                format="%Y%m%d %H%M%S")
df = df.set_index('datetime')

NB. if your days and hours/minutes/seconds are reliably padded with zeros, you can use df.pop('date')+df.pop('time') and format="%Y%m%d%H%M%S".

Output:

                     value
datetime                  
2024-01-12 08:59:17     11
2024-01-12 08:59:17     22

A variant with numeric operations and a timedelta:

df = pd.read_csv(csv_file, sep=', *', engine='python',
                 dtype={'date': str})

a = df.pop('time').to_numpy()
a, s = np.divmod(a, 100)
h, m = np.divmod(a, 100)

df['datetime'] = (pd.to_datetime(df.pop('date'))
                 +pd.to_timedelta(h*3600+m*60+s, unit='s')
                 )

which is actually much slower (27.7 ms ± 4.11 ms per loop vs 350 µs ± 44.5 µs per loop for the string approach)

FutureWarning: Support for nested sequences for 'parse_dates' in pd.read_csv is deprecated. How to combine date and time columns with pd.to_datetime?

There are 1 best solutions below

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in STRING-CONCATENATION

Related Questions in DATETIME-CONVERSION

Related Questions in READ-CSV

Trending Questions

Popular # Hahtags

Popular Questions