Caching large lookup table in JVM memory

1k Views Asked by At

We have a large dataset of historical transactions, and we have a system that is required to check new transactions against each historical transaction in this dataset.

This involves running a algorithm on each historical transaction which produces a matching score against the new transaction. This means going through the transactions sequentially, and we can't use indexing or hashing to try reduce the number of transactions that need to be checked.

A couple of other points, transactions are always added to the dataset and they are never evicted. In addition we do distribute the processing by splitting the dataset across workers on different servers.

Just now the system uses a Java Collection class to cache the transaction dataset in memory. This is mainly because of speed requirements as it provides fast sequential access to the transactions.

What I'd like to know are there any caching systems such as EHCache that would help us distribute the dataset across different servers but still provide fast sequential access to the records in the cache.

1

There are 1 best solutions below

2
On

Reinventing the wheel is so tempting! When Oracle has in memory database why can't we do the same... Let me try too. What about hashing array of bytes and keep these hashes? And when there is collision of hashes then go to the real database and double check whole array. So tempting...