I have a dataframe that has a time
property. This property is in seconds, but with nanosecond precision.
I was struggling to make this unique, but with help from a robot, managed to come up with this:
# Convert the time column to nanoseconds and add a sequence number for trades
df['time_ns'] = pd.to_datetime(df['time'], unit='s').values.astype(np.int64) + \
np.arange(len(df)) % (10 ** 9)
df.set_index('time_ns', inplace=True)
# Convert the time_ns column to a DatetimeIndex with nanosecond precision
df.index = pd.to_datetime(df.index, unit='ns')
# Get a list of the non-unique timestamps
non_unique = df.index[df.index.duplicated(keep=False)].unique()
# Print the non-unique timestamps
print("Non-unique values:")
print(non_unique)
dataset = PandasDataset(df, target="price")
Now, there are no non-unique values. However, the frequency calculation when creating the dataset is falling over, due to this in /pandas/tseries/frequencies.py
:
if not self.is_unique_asi8:
return None
Digging into this with the penetrating insight into Python I have developed over the last two weeks , I have discovered that this property, too, is an indication of uniqueness.
So how do I configure the dataset so that the index is considered unique? That it is considered at nanosecond precision? The incoming dataframe, it seems, is now unique.