I am still new to NoSQL database and have been using RDBMS (Oracle, MySQL) for years. Now, we are considering migrating one of our database to In-memory NoSQL DB and we are stuck with the best design approach.
We are considering Redis, but whether it will be in conjunction with another Key-value stores (like RocksBD or LMDB) or will best be used in isolation will be based on the advice I receive from you. (You are free to advice on a completely different approach to solving our problem).
The table to migrate to In-memory NoSQL contains both demographic (such as firstname, lastname, address,birthdate, country of origin etc. about 40 different demographic fields) and biometric data such as photograph, signature and all 10 fingerprints.
Queries will be run on demographics such as retrieve where firstname = 'jones' and lastname = 'Andre' and DateofBirth > 13 sept 1984
We can very easily save everything in redis key-value store (including photograph, signature, fingerprints and all demographic), but we are worried that it require a crazy amount of RAM especially since the DB will eventual grow to some 200million records. So we considered storing some of the demographics that will be frequently searched against in redis (e.g firstname, lastname, dateofbirth etc) and then storing the rest of the data in key-value store such as LMDB or RocksDB (since this will naturally require far less memory than redis). In this implementation, when someone desire to retrieve where firstname=jones and lastname=mark, it search redis, gets the id of retrieved record and then retrieve those records from the key-value store (lmdb or rocksDB).
We are bothered mainly about read performance with very little worry regarding write. We desire very fast read.
Is this a good design approach or can someone advice better design approach that will lead to better performance. Please remember that the goal is to minimize RAM requirement and get very good read performance.
And by the way, Is it even a good approach to store biometrics of this nature in memory?
How are challenges like this solved?
Also note that while we query against a subset of the demographic, doing retrieval, we mostly retrieve the entire set of data. (that is for each match individual we retrieve oth demographic and biometric)
I am great fan of Redis as it is a great storage and indexing tool. As far as I can see, your requirements does not really fit well for a 100% NoSQL design.
I may suggest to keep the data on your SQL and build composite indexes with Redis. Get a PK-lookup super-fast SQL (PostgreSQL) and index your data by PK in Redis. You won't have any problem with memory usage and everything will end up in running multiple PK queries to fetch the bulk of data. Or you can apply a strategy to index/cache only the CHARS column and keep the images and the over-sized values in SQL. Or temporary caching the sized data that has been accessed and evict keys of recently not accessed data.
About memory, what you point out is solved by the use of Redis Cluster.
[UPDATE] Usually I try to create a Redis key for every value I need to index; if you need to index a string use mono-scored Sorted Sets and leverage
ZINDEXBYRANGE
, for datetime you can set the score to a timestamp and useZRANGEBYSCORE
. Depending on your access/storage patterns you can decide to store part of your data and leave the bulk in a SQL. About speed, I cannot really say because it depends how you decide to design your keys/values and how much RAM you can allocate for the task.