Manipulate the Dataframe to start from the nearest varying Midnight timestamp

Question

Manipulate the Dataframe to start from the nearest varying Midnight timestamp

251 Views Asked by Royce Anton Jose At 08 August 2025 at 10:52

My goal:

I have a dataset that gets generated every day at random hours leading to the first row to start at a random time. I want to make this dataset start from the nearest midnight date. For example, if the date on the first row is 2022-05-09 15:00:00, I would have to slice the data to make it start from the nearest midnight, in this case: 2022-05-10 00:00:00

Here's what the dataset looks like:

What I have tried:

I had the idea of locating the index of the first occurrence of my desired timestamp and applying iloc to create the desired data set.

match_timestamp = "00:00:00"
[df[df.index.strftime("%H:%M:%S") == match_timestamp].first_valid_index()]

results: [Timestamp('2022-05-10 00:00:00')]

However, this would only result in extracting the timestamp where it first appears, and I would not be able to apply iloc to the row value. As of now, I'm stuck and can't think of a more elegant solution, which I'm sure exists.

I would be grateful if you could recommend a better method to this. Thank you in advance!

Here's the complete code to extract the df:

pip install ccxt

import pandas as pd
import ccxt

exchange = ccxt.okx({'options': {'defaultType': 'futures', 'enableRateLimit': True}})
markets = exchange.load_markets()

url = 'https://www.okex.com'
tickers = pd.DataFrame((requests.get(url+'/api/v5/market/tickers?instType=FUTURES').json())['data'])
tickers = tickers.drop('instType', axis=1)
futures_tickers = list(tickers['instId'])

symbol = 'LINK-USD-220930'
candlestick_chart= exchange.fetch_ohlcv(symbol, '1h', limit=500)
candlestick_df = pd.DataFrame(candlestick_chart)
candlestick_df.columns = ['date', 'open', 'high', 'low', 'close', 'volume']

candlestick_df['date'] = pd.to_datetime(candlestick_df['date'], unit='ms')
candlestick_df['date'] = candlestick_df['date'] + pd.Timedelta(hours=8)

df  = candlestick_df
df

The dictionary format: (as suggested)

{'open': {Timestamp('2022-05-09 15:00:00'): 9.742, Timestamp('2022-05-09 16:00:00'): 9.731, Timestamp('2022-05-09 17:00:00'): 9.743, Timestamp('2022-05-09 18:00:00'): 9.684, Timestamp('2022-05-09 19:00:00'): 9.206, Timestamp('2022-05-09 20:00:00'): 9.43, Timestamp('2022-05-09 21:00:00'): 9.316, Timestamp('2022-05-09 22:00:00'): 9.403, Timestamp('2022-05-09 23:00:00'): 9.215, Timestamp('2022-05-10 00:00:00'): 9.141}, 'high': {Timestamp('2022-05-09 15:00:00'): 9.835, Timestamp('2022-05-09 16:00:00'): 9.75, Timestamp('2022-05-09 17:00:00'): 9.788, Timestamp('2022-05-09 18:00:00'): 9.697, Timestamp('2022-05-09 19:00:00'): 9.465, Timestamp('2022-05-09 20:00:00'): 9.469, Timestamp('2022-05-09 21:00:00'): 9.515, Timestamp('2022-05-09 22:00:00'): 9.413, Timestamp('2022-05-09 23:00:00'): 9.308, Timestamp('2022-05-10 00:00:00'): 9.223}, 'low': {Timestamp('2022-05-09 15:00:00'): 9.699, Timestamp('2022-05-09 16:00:00'): 9.596, Timestamp('2022-05-09 17:00:00'): 9.674, Timestamp('2022-05-09 18:00:00'): 8.739, Timestamp('2022-05-09 19:00:00'): 9.11, Timestamp('2022-05-09 20:00:00'): 9.3, Timestamp('2022-05-09 21:00:00'): 9.208, Timestamp('2022-05-09 22:00:00'): 9.174, Timestamp('2022-05-09 23:00:00'): 9.035, Timestamp('2022-05-10 00:00:00'): 8.724}, 'close': {Timestamp('2022-05-09 15:00:00'): 9.725, Timestamp('2022-05-09 16:00:00'): 9.745, Timestamp('2022-05-09 17:00:00'): 9.682, Timestamp('2022-05-09 18:00:00'): 9.18, Timestamp('2022-05-09 19:00:00'): 9.426, Timestamp('2022-05-09 20:00:00'): 9.32, Timestamp('2022-05-09 21:00:00'): 9.397, Timestamp('2022-05-09 22:00:00'): 9.229, Timestamp('2022-05-09 23:00:00'): 9.152, Timestamp('2022-05-10 00:00:00'): 8.82}, 'volume': {Timestamp('2022-05-09 15:00:00'): 3663.0, Timestamp('2022-05-09 16:00:00'): 6603.0, Timestamp('2022-05-09 17:00:00'): 2855.0, Timestamp('2022-05-09 18:00:00'): 20084.0, Timestamp('2022-05-09 19:00:00'): 8972.0, Timestamp('2022-05-09 20:00:00'): 5551.0, Timestamp('2022-05-09 21:00:00'): 8218.0, Timestamp('2022-05-09 22:00:00'): 7651.0, Timestamp('2022-05-09 23:00:00'): 6935.0, Timestamp('2022-05-10 00:00:00'): 10409.0}}

Original Q&A

There are 1 best solutions below

**NoobVB** · Accepted Answer

my minimalistic approach of the pandaNewstarter you simply can apply it to your candlestick_df:

import pandas as pd
import datetime
df = pd.read_csv("data.csv")
df.dtypes
# convert date column to dtype timestamp
df.date = pd.to_datetime(df.date) 

# get min value from date colum
min_date = df.date.min()

# from min get next day midnight timestamp value
NextDay_Date = (min_date + datetime.timedelta(days=1)).replace(hour=0, minute=0, second=0, microsecond=0)
NextDay_Date

# create new DateFrame by slicing original
df2 = df[df.date >= NextDay_Date].copy()

out:

print(NextDay_Date)
2022-05-10 00:00:00
print(df2)
                   date   open   high    low  close   volume
9   2022-05-10 00:00:00  9.141  9.223  8.724  8.820  10409.0
10  2022-05-10 01:00:00  8.755  8.979  8.558  8.832  11522.0
11  2022-05-10 02:00:00  8.815  8.880  8.304  8.593  20969.0
12  2022-05-10 03:00:00  8.618  8.720  8.370  8.610  15794.0
13  2022-05-10 04:00:00  8.610  8.929  8.610  8.736   9410.0
..                  ...    ...    ...    ...    ...      ...

Manipulate the Dataframe to start from the nearest varying Midnight timestamp

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATETIME

Related Questions in TIME-SERIES

Related Questions in PANDAS-TIMEINDEX

Trending Questions

Popular # Hahtags

Popular Questions