Struggling to create a unique datetime index in Pandas dataset

79 Views Asked by At

I have a dataframe that has a time property. This property is in seconds, but with nanosecond precision.

I was struggling to make this unique, but with help from a robot, managed to come up with this:

# Convert the time column to nanoseconds and add a sequence number for trades
df['time_ns'] = pd.to_datetime(df['time'], unit='s').values.astype(np.int64) + \
                np.arange(len(df)) % (10 ** 9)
df.set_index('time_ns', inplace=True)

# Convert the time_ns column to a DatetimeIndex with nanosecond precision
df.index = pd.to_datetime(df.index, unit='ns')

# Get a list of the non-unique timestamps
non_unique = df.index[df.index.duplicated(keep=False)].unique()

# Print the non-unique timestamps
print("Non-unique values:")
print(non_unique)

dataset = PandasDataset(df, target="price")

Now, there are no non-unique values. However, the frequency calculation when creating the dataset is falling over, due to this in /pandas/tseries/frequencies.py:

if not self.is_unique_asi8:
    return None

Digging into this with the penetrating insight into Python I have developed over the last two weeks , I have discovered that this property, too, is an indication of uniqueness.

So how do I configure the dataset so that the index is considered unique? That it is considered at nanosecond precision? The incoming dataframe, it seems, is now unique.

0

There are 0 best solutions below