Do memtables ever actually use trees over skip lists?

184 Views Asked by At

I keep seeing articles that mention how memtables are usually backed by a self balancing tree like AVL tree or red-black tree.

Examples:

If you actually look at Cassandra or rocksDB implementation, they both use skip lists as the underlying data structure. I assume this is because trees have poor concurrency support since a lot of nodes would need to be locked when writing due to the self balancing property.

If this is the case, why are so many articles mentioning trees rather than skip lists when discussing memtables? Am I missing something?

2

There are 2 best solutions below

0
Paul On BEST ANSWER

As I understand it, Cassandra uses a skip list for indexing partitions in the memtable, and then b-trees to index the rows inside each partition, but that's the most recent implementation. If you want a deep dive on some of the more recent memtable implementation, I recommend my colleague Branimir's article, he is far more knowledgable than I on the subject: https://www.vldb.org/pvldb/vol15/p3359-lambov.pdf

0
Nick On

I am the author of HM4 key value database (https://github.com/nmmmnu/HM4).

Here is why I choose SkipList over self balanced trees:

  • Is much easier to program it.
  • Have almost same speed.
  • Code is more understandable - I am not sure if this holds, because my SkipList implementation got very complicated over the years.
  • This is more or less same as previous point, but SkipList, if done "right" have only one critical function - the one that locate the node. This function may or may be not duplicated for const / mutable, but the idea is the same. This function is usually one screen only and you can easily understand it. Here are mine:
  • SkipList implementation can be checked easily using similar LinkList implementation. What I mean is following - most of optimizations and weird ideas, can be first check / test on LinkList and then ported on SkipList, once you know they works. For example in my case, I am comparing strings as numbers, skipping some if statements and loops when I can etc.

Here are the self balanced trees tests I did over the years:

Hope this helps too.