I have a dataframe like this:
lat lon year month day hour minute second millisecond
0.0 0.0 2023.0 11.0 22.0 10.0 15.0 34.0 345.0
0.0 0.0 2023.0 11.0 22.0 10.0 23.0 53.0 0.0
...
I want to create a DateTimeIndex using the date/time columns keeping the millisecond precision, the time is in UTC.
What I did is to extract those columns and created an DateTimeIndex using to_datetime, the code is:
utc_df = df.iloc[:, 2:]
datetimeindex = pd.to_datetime(utc_df, utc=True)
the result is:
>>> datetimeindex
0 2023-11-22 10:15:34.345000+00:00
1 2023-11-22 10:23:53+00:00
...
Length: 23179, dtype: datetime64[ns, UTC]
The problem is with the millisecond precision.
If the column millisecond contains a not-zero value, this is visualised with a microsecond precision, if it's zero it's omitted.
I tried adding unit="ms" to to_datetime, but the result is the same.
If I remove utc=True, the visualisation is what I would like:
>>> pd.to_datetime(utc_df)
0 2023-11-22 10:15:34.345
1 2023-11-22 10:23:53.000
...
Length: 23179, dtype: datetime64[ns]
but if I print out just one element:
pd.to_datetime(utc_df)[0]
Timestamp('2023-11-22 10:15:34.345000')
the microseconds are back.
I tried to modify the format in this way:
datetimeindex = datetimeindex.map(
lambda x: x.isoformat(timespec="milliseconds")
)
but this changes also the type of the elements into string and I want Timestamp.
Is there a way to have the millisecond with just three digits keeping the Timestamp type?
IMPORTANT NOTE:
As this is an exercise, the only libraries I can use are pandas and numpy, I cannot import anything else.
Since you don't have a timezone information, why use
utc=True?Just go with:
Or
If you want to convert from a timezone aware timestamp to a non-timezone aware one:
Output: