Why must skip lists hold duplicate elements within the same tower?

109 Views Asked by At

I'm learning about skip lists currently, and I'm struggling to understand why a tower holds duplicates of the same element on each level. My intuition tells me that this is redundant and the same probabilistic qualities of a skip list can be achieved without duplicating elements within a tower during insertion.

Consider this example of searching a skip list for the key 50 from Data Structures & Algorithms in C++ by Goodrich, Tamassia & Mount, where visited positions are highlighted in blue: Example of a search in a skip list (Here is a video illustrating each step of the search)

This skip list itself holds 37 nodes, 25 of them being duplicates that reference the same element. Further, each drop-down in the search is guaranteed to be the same element, or NULL if at S0. So I say, let's get rid of the duplicates under the top of each tower and instead have them point down to the only nodes that they could traverse from. Any given search traversal within this skip list can only ever result in the same paths of key comparisons, and so I believe it could be structured more efficiently like so:

S_5:               -∞
                 /    \
S_4:           17      12
             /  |  \
S_3:       25  55   20
            |
S_2:       31
          /   \
S_1:    38     44
         |      |
S_0:    39     50

In this structure, we can compare each child left-to-right and drop down to that child if desired_key >= child, or return the current node if no children meet that criterion. I believe insertion could be worked out by flipping a coin at each level until you get tails and using the same criterion to insert within the children, I think that would have similar or the same probabilistic qualities as the original skip list.

I understand that skip lists are supposed to be an alternative to binary search trees and perhaps this is possibly just a less efficient search tree, but I still have this nagging feeling of space redundancy. Do you think removing duplicate key entries is possible in skip lists using this or a similar method?

1

There are 1 best solutions below

0
Jim Mischel On

The video is a good example of how a skip list works, but it's not a particularly good example of a skip list, in general. In practice, a skip list will have thousands or perhaps millions of data items and, on average (assuming a 1/2 probability of adding a link at the next level), two links per node. That gives it the same space complexity as a binary search tree. Empirical evidence shows that skip lists are faster than balanced trees in lookup, and in some cases actually use less space. The kicker for me is that inserting into and removing from a skip list is much simpler and faster than maintaining a balanced binary search tree.

That said, if you were to remove the lower levels of the tower then there's no guarantee that you could find an item at all. You would for sure need to keep the lowest level of the tower. Otherwise there'd be no guaranteed way to find a particular node. And if you don't have the intermediate levels of the tower, then the "skip ahead" functionality becomes much less useful: you quickly skip ahead as far as you can at the top level and then poke along testing each node after that.

If you're worried about space (but again, the skip list actually uses the same amount of space as a balanced tree), you could always use a different probability for adding the next level of links. As I said, 1/2 is common: when you insert a node you create the lowest level of links and with probability of 1/2, add it to another level. You could change that to 1/4 or 1/7 or whatever you like. That would reduce the space required for the list, but would slow searching time.