High Performance DB for Fast Read and Fast Write. No Update or Delete

47.9k Views Asked by At

I am looking for the database/mechanism to store the data where I can write the data and read the data with high performance.

This storage is used to for storing the Logging like important information across multiple systems. Since it's critical data which will be logged, read performance should be pretty fast as these data will be used to show history. Since we never do update on them/delete on them/or do any kinda joins, I am looking for right solution. Probably we might archive the data in long time but that's something ok to deal with.

I tried looking at different sources to understand different NoSql databases, experts opinion is always better :)

Must Have:
1. Fast Read without fail
2. Fast Write without fail
3. Random access Performance
4. Replication kinda feature, one goes down, immediately another should be up and working
5. Concurrent write/read data

Good to Have:
1. Search content like analysing the data for auditing with/without Indexes

Don't required:
1. Transactions are not required at all
2. Update never happens
3. Delete never happens
4. Joins are not required

Referred: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

3

There are 3 best solutions below

3
On

Disclosure: Kevin Porter is a Senior Software Engineer at Aerospike, Inc. since May 2013. (ref)

Be sure to consider Aerospike; Aerospike dominates in the adtech space where high throughput reads and writes are a required. Aerospike is frequently touted as having "the speed of Redis with the scalability of Cassandra." For searching/querying see Aerospike's secondary index documentation.

For more information see the discussion/articles below:

  1. Aerospike vs Cassandra
  2. Aerospike vs Redis and Mongo
  3. Aerospike Benchmarks

Lastly verify the performance for yourself with the One million TPS on EC2 Instructions.

3
On

Let me be the Cassandra sponsor.

Disclaimer: I don't say Cassandra is better than the others because I don't even know so deeply mongo/redis/whatever and I don't want even come into this kind of stuffs.

The reason why I suggest Cassandra is because your needs match perfectly with what Cassandra offers and your "don't required list" is a set of feature that are either not supported in Cassandra (joins for instances) or considered an anti-pattern (deletes and in some situations updates).

From your "Must Have" list, point by point

  1. Fast Read without fail: Supported. You can choose the consistency level of each read operation deciding how much important is to retrieve the most fresh information and how much important is speed

  2. Fast Write without fail: Same as point 1

  3. Random access Performance: When coming in the Cassandra world you have to consider many parameters to get a random access performance but the most important that comes into my mind is the data model -- if you create a data model that scales horizontally (give a look here) and you avoid hotspots you get what you need. If you model your DB in a good way you should have O(1) for each operation since data are structured to be queried

  4. Replication: In this Cassandra is even better than what you might think. If one node goes down nothing changes to the cluster and everything(*) keep working perfectly. Cassandra spots no single point of failure. I can tell you with older Cassandra version I've had an uptime of more than 3 years

  5. Concurrent write/read data: Cassandra uses the lww policy (last-write-wins) to handle concurrent writes on the same key. The system supports multiple read-write and with newer protocols also async operations.

There are lots of other interesting features Cassandra offers: linear horizontal scaling is the one I appreciate more but there is also the fact that you can know the instant in which every piece of data has been updated (the timestamp of lww), counters features and so on.

(*) - if you don't use Consistency Level All which, imho, should NEVER be used in such a system.

0
On

Here's a few more links on how you can span In-Memory with Disk (DRAM, SSM, and disk storage) w/ Aerospike:

http://www.aerospike.com/hybrid-memory/

http://www.aerospike.com/docs/architecture/storage.html

I think everyone is right in terms of matching the specific DB to your specific use case. For instance, Aerospike is optimal for key-value data. Other options might be better.

By way of analogy, I'll always remember how, decades ago, a sister of mine once borrowed my computer and wrote her term paper in Microsoft Excel. Line after line was a different row of a spreadsheet. It looked ugly as heck, but, uh, okay. She got the task done. She cursed and swore at how difficult it was to edit the thing. No kidding!

Choosing the right NoSQL database for the right task will either make your job a breeze, or could cause you to curse a blue streak if you decided on the wrong basic tool for the task at hand.

Of course, every vendor's going to defend their product. I think it's best the community answer the question. Here's another Stack Overflow thread answering a similar question:

Has anyone worked with Aerospike? How does it compare to MongoDB?

btw: Do you have any more specific insights for us on what type of problem you are trying to solve?