I have multikey index over field that store an array of integers. How I can implement cursor-based pagination using it?
In real if tuple has more than one value in array field it tuple will be selected #tuple[array_field_idx]
times.
Yesterday I implemented "distinct" select (using ffi to get tuple pointer address), but it seems that it will not be works with pagination.
Have you any ideas how it can be implemented in Tarantool?
In order to implement pagination for multikey index you should know something about Tarantool internals.
I will write code that works for Tarantool 2.3+ (but it could be broken in future). Please test them carefully and update FFI definition in case of upgrade Tarantool version.
So, let's start. Firstly you should know that Tarantool BTree stores data in special structure memtx_tree_data that contains a pointer to your tuple and
hint
. It's special number that allows to speedup comparisons between tuples for simple tree index and it's a position of indexed element in array.Firstly, we should to understand how to extract tuple hint with a tuple. It could be done using some piece of FFI code and tree iterator.
Then consider following example:
Output is:
As you see the order is strictly determined. Tarantool returns me tuples in order that is determined with (a) indexed value - tuple[path_to_array][hint+1] and primary key. The second condition is common for all Tarantool secondary non-unique indexes. Tarantool internally merge primary key to every non-unique index. All you need is to specify it explicitly in your schema.
So the next term is
cursor
. Cursor allows you to continue iteration since place where you previously stopped. For unique indexes cursor is fields of this indexes, for non-unique index it's fields of this indexes with merger primary key (for details see key_def.merge function, currently it doesn't support multikey indexes but it's useful if you need to understand how to work index parts merge). Following combination (merge(secondary_index_parts, primary_index_parts)
) is always unique value that allows to to continue iteration since strictly determined place.Let's return back to my example. E.g. I stopped at the row
[1, ['a', 'b', 'c', 'd']] 3ULL c
. My cursor is{'c', 1}
.Well, now I could continue since this point:
You can compare with previous snippet and understand that I continue scan since needed value and don't scan redundant values and don't lose anything.
Such approach isn't quite clear and it's not completely comfortable. You need to extract some magic value from Tarantool internals, store them anywhere. But we use such approach in our projects because we don't have any alternatives yet :)