How to skip some keys by using iterator?

557 Views Asked by At

As an example, I have add several keys to DB just like ,

<1 + 2> 
<1 + 3>
<2 + 1>
<2 + 4>
<3 + 2>

First, Seek() to <1, 2> and then Next() to <1, 3> After that, I want to skip the key <2, 1> and <2, 4> (whose prefix are all 2) and move the iterator to <3, 2> without a new seek operation. The use of a new Seek() operation is unexpected due to that Seek() is expensive. Which method should I use?

This skip scan approach is similar to this

I prefer to program like the following lines:

DBIter* it = NewDBIterator(...);
set = {key1, key2, key3, ...};
Iterator key_iter = set.begin();
for (it->SeekToFirst(); it->Valid() && key_iter != set.end(); it->SkipToNext(*key_iter), ++ key_iter) {
  // do something
}

1

There are 1 best solutions below

0
On

As described in the post you are linking the skip scan works by looking at a key-prefix under the assumption that keys are stored in order. If you are looking for any value of the second key part smaller than 3 in:

1,2
1,3
1,4
2,1
2,2
2,3
...

what you know when you reach 1,3 is that there will be no more keys matching your predicate which have key-prefix 1, so you can skip ahead to the next key-prefix. This typically still means you have to look at least at every key-prefix on the way to find the next prefix, or look it up in some way. Whether or not this is good depends. For an operation on a distinct set of keys separate lookups are almost certainly generally the better option, because unless you know very well what your data looks like you don't know the number of keys you have to prefix-scan, and you might have to look at every single one (O(n)), where as k lookups take only O(k) * O(log(n)) time. So as long as k << n, definitely do a lookup. The optimization you are talking about applies to predicates on keys, for which you would otherwise have to evaluate the predicate on every key in the table. Hence skipping over the keys is an optimization in that case, because you have to evaluate the predicate less often, and get away with a cheap predicate comparison.