How to implement pagination for multikey indexes in Tarantool?

224 Views Asked by At

I have multikey index over field that store an array of integers. How I can implement cursor-based pagination using it?

In real if tuple has more than one value in array field it tuple will be selected #tuple[array_field_idx] times. Yesterday I implemented "distinct" select (using ffi to get tuple pointer address), but it seems that it will not be works with pagination.

Have you any ideas how it can be implemented in Tarantool?

1

There are 1 best solutions below

0
On

In order to implement pagination for multikey index you should know something about Tarantool internals.

I will write code that works for Tarantool 2.3+ (but it could be broken in future). Please test them carefully and update FFI definition in case of upgrade Tarantool version.

So, let's start. Firstly you should know that Tarantool BTree stores data in special structure memtx_tree_data that contains a pointer to your tuple and hint. It's special number that allows to speedup comparisons between tuples for simple tree index and it's a position of indexed element in array.

Firstly, we should to understand how to extract tuple hint with a tuple. It could be done using some piece of FFI code and tree iterator.

local ffi = require('ffi')

ffi.cdef([[

typedef struct index_def;

typedef struct index;

typedef struct memtx_tree;

typedef struct mempool;

typedef uint64_t hint_t;

enum iterator_type {
    /* ITER_EQ must be the first member for request_create  */
    ITER_EQ               =  0, /* key == x ASC order                  */
    ITER_REQ              =  1, /* key == x DESC order                 */
    ITER_ALL              =  2, /* all tuples                          */
    ITER_LT               =  3, /* key <  x                            */
    ITER_LE               =  4, /* key <= x                            */
    ITER_GE               =  5, /* key >= x                            */
    ITER_GT               =  6, /* key >  x                            */
    ITER_BITS_ALL_SET     =  7, /* all bits from x are set in key      */
    ITER_BITS_ANY_SET     =  8, /* at least one x's bit is set         */
    ITER_BITS_ALL_NOT_SET =  9, /* all bits are not set                */
    ITER_OVERLAPS         = 10, /* key overlaps x                      */
    ITER_NEIGHBOR         = 11, /* tuples in distance ascending order from specified point */
    iterator_type_MAX
};


typedef struct iterator {
    /**
     * Iterate to the next tuple.
     * The tuple is returned in @ret (NULL if EOF).
     * Returns 0 on success, -1 on error.
     */
    int (*next)(struct iterator *it, struct tuple **ret);
    /** Destroy the iterator. */
    void (*free)(struct iterator *);
    /** Space cache version at the time of the last index lookup. */
    uint32_t space_cache_version;
    /** ID of the space the iterator is for. */
    uint32_t space_id;
    /** ID of the index the iterator is for. */
    uint32_t index_id;
    /**
     * Pointer to the index the iterator is for.
     * Guaranteed to be valid only if the schema
     * state has not changed since the last lookup.
     */
    struct index *index;
};


struct memtx_tree_key_data {
    /** Sequence of msgpacked search fields. */
    const char *key;
    /** Number of msgpacked search fields. */
    uint32_t part_count;
    /** Comparison hint, see tuple_hint(). */
    hint_t hint;
};

struct memtx_tree_data {
    /* Tuple that this node is represents. */
    struct tuple *tuple;
    /** Comparison hint, see key_hint(). */
    hint_t hint;
};

typedef int16_t bps_tree_pos_t;
typedef uint32_t bps_tree_block_id_t;

typedef uint32_t matras_id_t;

struct matras_view {
    /* root extent of the view */
    void *root;
    /* block count in the view */
    matras_id_t block_count;
    /* all views are linked into doubly linked list */
    struct matras_view *prev_view, *next_view;
};

struct memtx_tree_iterator {
    /* ID of a block, containing element. -1 for an invalid iterator */
    bps_tree_block_id_t block_id;
    /* Position of an element in the block. Could be -1 for last in block*/
    bps_tree_pos_t pos;
    /* Version of matras memory for MVCC */
    struct matras_view view;
};

typedef struct tree_iterator {
    struct iterator base;
    struct memtx_tree_iterator tree_iterator;
    enum iterator_type type;
    struct memtx_tree_key_data key_data;
    struct memtx_tree_data current;
    /** Memory pool the iterator was allocated from. */
    struct mempool *pool;
};

]])


local function get_tree_comparison_hint(box_iterator_state)
    if box_iterator_state == nil then
        return nil
    end

    local casted = ffi.cast("struct tree_iterator*", box_iterator_state)
    --
    -- IMPORTANT: hint is zero-based (as arrays in C)
    -- Lua arrays is one-based.
    --
    return casted.current.hint
end

return {
    get_tree_comparison_hint = get_tree_comparison_hint,
}

Then consider following example:

local box_iterator = require('common.box_iterator')

box.cfg{}

local space = box.schema.create_space('dict', {
    format = {
        {name = 'id', type = 'number'},
        {name = 'bundles', type = 'array'}
    },
    if_not_exists = true,
})

space:create_index('pk', {
    unique = true,
    parts = {
        {field = 1, type = 'number'}
    },
    if_not_exists = true,
})

space:create_index('multikey', {
    unique = false,
    parts = {
        {field = 2, type = 'string', path = '[*]'},
        -- Note: I intentionally add primary index parts here
        {field = 1, type = 'number'}
    },
    if_not_exists = true,
})

space:replace({1, {'a', 'b', 'c', 'd'}})
space:replace({2, {'b', 'c'}})
space:replace({3, {'a', 'd'}})
space:replace({4, {'c', 'd'}})

for iter_state, tuple in space.index.multikey:pairs({'a'}, {iterator = 'GE'}) do
    local position = box_iterator.get_tree_comparison_hint(iter_state) + 1
    print(
        string.ljust(tostring(tuple), 30),
        position,
        tuple[2][tonumber(position)]
    )
end

os.exit()

Output is:

# Tuple                         Hint   Indexed element
[1, ['a', 'b', 'c', 'd']]       1ULL    a
[3, ['a', 'd']]                 1ULL    a
[1, ['a', 'b', 'c', 'd']]       2ULL    b
[2, ['b', 'c']]                 1ULL    b
[1, ['a', 'b', 'c', 'd']]       3ULL    c
[2, ['b', 'c']]                 2ULL    c
[4, ['c', 'd']]                 1ULL    c
[1, ['a', 'b', 'c', 'd']]       4ULL    d
[3, ['a', 'd']]                 2ULL    d
[4, ['c', 'd']]                 2ULL    d

As you see the order is strictly determined. Tarantool returns me tuples in order that is determined with (a) indexed value - tuple[path_to_array][hint+1] and primary key. The second condition is common for all Tarantool secondary non-unique indexes. Tarantool internally merge primary key to every non-unique index. All you need is to specify it explicitly in your schema.

So the next term is cursor. Cursor allows you to continue iteration since place where you previously stopped. For unique indexes cursor is fields of this indexes, for non-unique index it's fields of this indexes with merger primary key (for details see key_def.merge function, currently it doesn't support multikey indexes but it's useful if you need to understand how to work index parts merge). Following combination (merge(secondary_index_parts, primary_index_parts)) is always unique value that allows to to continue iteration since strictly determined place.

Let's return back to my example. E.g. I stopped at the row [1, ['a', 'b', 'c', 'd']] 3ULL c. My cursor is {'c', 1}.

Well, now I could continue since this point:

-- "GE" is changed to "GT" to skip already scanned tuple: [1, ['a', 'b', 'c', 'd']]
for iter_state, tuple in space.index.multikey:pairs({'c', 1}, {iterator = 'GT'}) do
    local position = box_iterator.get_tree_comparison_hint(iter_state) + 1
    print(
        string.ljust(tostring(tuple), 30),
        position,
        tuple[2][tonumber(position)]
    )
end
--[[
Result:
[2, ['b', 'c']]                 2ULL    c
[4, ['c', 'd']]                 1ULL    c
[1, ['a', 'b', 'c', 'd']]       4ULL    d
[3, ['a', 'd']]                 2ULL    d
[4, ['c', 'd']]                 2ULL    d
--]]

You can compare with previous snippet and understand that I continue scan since needed value and don't scan redundant values and don't lose anything.

Such approach isn't quite clear and it's not completely comfortable. You need to extract some magic value from Tarantool internals, store them anywhere. But we use such approach in our projects because we don't have any alternatives yet :)