Improving KDB HDB query performance

75 Views Asked by At

We're currently facing significant delays in query response times from the Historical Database (HDB) component of our tickerplant architecture. Allow me to provide some context to better understand our setup and the challenges we're encountering.

Our architecture revolves around two core tables: the orders table, storing all users' order histories, and the Real-time Database (RDB), which performs an end-of-day write-down to the HDB. As an exchange, We consistently query both the RDB and HDB to retrieve all historical data for a particular user, dating back to their initial joining of the exchange.

The issue lies in the query performance specifically related to accessing data from the HDB. While querying the RDB yields satisfactory response times, querying the HDB is proving to be time-consuming. One of the potential reasons for this could be the current partitioning strategy employed in the HDB.

As it stands, the HDB is partitioned by date, which means that retrieving user order histories requires multiple disk operations to locate the relevant data. Our typical order query resembles SELECT row FROM orders WHERE order_uuid = <order_uuid>. Although the table includes both user UUID (unique identifier of the user) and order UUID (newly created orders will have a new order UUID), partitioning by date necessitates additional disk operations to find the user's complete order history.

One possible solution we're considering is to partition the HDB by user instead of date. This adjustment would streamline the querying process, as the HDB would only need to perform a single disk operation to retrieve a user's entire order history, significantly improving query response times. However, this approach has its drawbacks, particularly for users who trade very frequently. Over time, the table for such users could become excessively large, potentially leading to performance degradation and storage issues.

Given this context, we're seeking insights and alternative approaches to optimize query performance from the HDB while balancing considerations such as query efficiency and storage management. Any suggestions or recommendations would be greatly appreciated.

Thank you in advance for your assistance.

1

There are 1 best solutions below

3
user20349 On

Fundamentally you are not able to cut your search space down given this setup.

I'd consider a two-stage query. First, store a start/end date per order id, potentially in a splayed table. Given a user field and GUID, and applying `p# on user to quickly narrow down the range, this should easily scale to a few hundred million orders or better.

Then this tells you the exact date range you need for the query to get at the order details. Again if you have `p# on sym or user it would help.