Optimizing time range filters in Amazon QLDB

978 Views Asked by At

I am working on a ledger with a table of Transactions. Each entry has a transaction_id, account_id, timestamp and other metadata. I need to query for all Transactions for a given account_id with a between operator on timestamp

My planned approach was to build an index on account_id, transaction_id and timestamp. However I have noted a limitation on inequalities and indexes from the AWS Documentation and I had planned applying this to timestamp

Query performance is improved only when you use an equality predicate; for example, fieldName = 123456789.

QLDB does not currently honor inequalities in query predicates. One implication of this is that range filtered scans are not implemented.

...

Warning

QLDB requires an index to efficiently look up a document. Without an indexed field in the WHERE predicate clause, QLDB needs to do a table scan when reading documents. This can cause more query latency and can also lead to more concurrency conflicts.

Transactions would be generated and grow indefinetly over time, and I would need to be able to query a weeks worth of transactions at a time.

Current Query:

SELECT * 
FROM Transactions 
WHERE "account_id" = 'test_account' and "timestamp" BETWEEN `2020-07-05T00:00Z` AND `2020-07-12T00:00Z`

I know it is possible to stream the data to a database more suited for this query, such as dynamodb, but I would like to know if my performance concerns performing the above query is valid, and if it is, what is the recommended indexes and query to ensure this scales and does not result in a scan across all transactions for the given account_id?

2

There are 2 best solutions below

0
On

QLDB has a history() function that works like a charm to generate statements, since you can pass one or two dates as arguments for start and/or end dates.

You see, this is where QLDB gets tricky: when you think of it as a relational database.

The caveat here is that you would have to change your transactions to be updates in the account table rather than new inserts in a different table. This is because, by design, QLDB gives you the ledger of any table. Meaning you can later check all versions of that record and filter them as well.

Here's an example of what a history query would look like in an Accounts table:

SELECT ha.data.* from Accounts a by Accounts_id
JOIN history(Accounts, `2022-04-10T00:00:00.000Z`, `2022-04-13T23:59:59.999Z`) ha 
ON ha.metadata.id = Accounts_id
WHERE a.account_id = 1234

This different segment by Accounts_id is what QLDB uses to get the index on your history table and how you can join both tables by an indexed column. In this case, account_id.

3
On

Thanks for your question (well written and researched)!

QLDB, at the time of writing, does not support range indexes. So, the short answer is "you can't."

I'd be interested to know what the intention behind your query is. For example, is getting a list of transactions between two dates something you need to do to form a new transaction or is it something you need for reporting purposes (e.g. displaying a user statement).

Nearly every use-case I've encountered thus far is the latter (reporting), and is much better served by replicating data to something like ElasticSearch or Redshift. Typically this can be done with a couple of lines of code in a Lambda function and the cost is extremely low.