Indexing several vector fields using multi_index

70 Views Asked by At

I have the following class and I need to index employee's hobbies.

class Employee {
  ...
  std::vector<std::string> hobbies;
}            

The suggested way to do it is to create a flat structure:

struct EmployeeWithHobby {
    Employee employee;
    std::string hobby;
};

And then index by this field:

typedef boost::multi_index::multi_index_container<
      EmployeeWithHobby,
      boost::multi_index::indexed_by<
          boost::multi_index::hashed_non_unique<
              boost::multi_index::tag<ByHobby>,
              boost::multi_index::member<
                  EmployeeWithHobby, std::string,
                  &EmployeeWithHobby::hobby>>>> index;

This works great so far. But what if I need to index another vector. Let's say employee has different skills:

class Employee {
  ...
  std::vector<std::string> hobbies;
  std::vector<std::string> skills;
}  

How would I index both of these fields? Would I create a separate struct and a separate index?

struct EmployeeWithSkill {
    Employee employee;
    std::string skill;
};
typedef boost::multi_index::multi_index_container<
      EmployeeWithHobby, ...

Or should I add a field to the existing struct?

struct EmployeeWithHobbyAndSkill {
    Employee employee;
    std::string hobby;
    std::string skill;
};

That would require O(N*M) insertions because I would have to iterate both hobbies and skills. If there is another vector, it quickly becomes unmanageable...

Or can I use std::optional to avoid O(N*M) and turns into O(N+M) entries?

struct EmployeeWithHobbyAndSkill {
    Employee employee;
    std::optional<std::string> hobby;
    std::optional<std::string> skill;
};
1

There are 1 best solutions below

0
On

Optimizing The Base Problem

You could use pre-indexed hobbies/skills (they will likely form a slowly growing static set).

A very minimal example:

Live On Coliru

#include <boost/dynamic_bitset.hpp>
#include <boost/multi_index/key.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index_container.hpp>

#include <iomanip>
#include <iostream>

using Pool = std::vector<std::string_view>; // append-only or immutable
static const Pool hobbies{"paleontelogy", "taxidermy", "procrastination"};
static const Pool skills{"dig", "mummify", "procrastinate"};

using RefSet = boost::dynamic_bitset<>;

RefSet reference(Pool const& p, auto&&... key) {
    RefSet result(p.size());
    result.set(std::size_t((std::find(p.begin(), p.end(), key) - p.begin()))...);
    return result;
}

struct Employee {
    std::string name;
    RefSet      hobbies, skills;
};

int main() {
    auto taxidermy       = reference(hobbies, "taxidermy");
    auto paleontelogy    = reference(hobbies, "paleontelogy");
    auto procrastination = reference(hobbies, "procrastination");
    auto dig             = reference(skills, "dig");
    auto mummify         = reference(skills, "mummify");
    auto procrastinate   = reference(skills, "procrastinate");

    std::vector<Employee> ee{
        Employee{"John", {taxidermy}, {}},
        Employee{"Noah", {taxidermy | paleontelogy}, {mummify}},
        Employee{"Henoch", {}, {procrastinate}},
    };

    for (auto& [n, h, s] : ee) {
        std::cout << quoted(n) << " h:" << h << " s:" << s << "\n";
        for (size_t ref = 0; ref < h.size(); ++ref)
            if (h[ref])
                std::cout << " - hobby: " << quoted(hobbies.at(ref)) << "\n";
        for (size_t ref = 0; ref < s.size(); ++ref)
            if (s[ref])
                std::cout << " - skill: " << quoted(skills.at(ref)) << "\n";
    }
}

Printing

"John" h:010 s:
 - hobby: "taxidermy"
"Noah" h:011 s:010
 - hobby: "paleontelogy"
 - hobby: "taxidermy"
 - skill: "mummify"
"Henoch" h: s:100
 - skill: "procrastinate"

Of course for "real" applications you'd hide the Pool behind a class instead of manually doing the (reverse) lookups like here.

Enter Multi Index

With Multi Index, it could be that you hope to actually lookup all the entries that match a given skill/hobby. You can achieve it with a custom comparison predicate and many indices, but it will not be very flexible.

Instead, you may want to have several indexes over the same table:

Live On Coliru

#include <boost/multi_index/key.hpp>
#include <boost/multi_index/member.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/random_access_index.hpp>
#include <boost/multi_index_container.hpp>
#include <boost/range/iterator_range.hpp>

#include <iomanip>
#include <iostream>
#include <map>
#include <ranges>
#include <set>

struct Employee {
    std::string name;
};

namespace bmi = boost::multi_index;

using Table = bmi::multi_index_container< //
    Employee,
    bmi::indexed_by< //
        bmi::ordered_unique<bmi::key<&Employee::name>>
        // , ... // other secondary indices
        >>;

using Index = std::multimap<std::string_view, std::reference_wrapper<Employee const>>;

int main() {
    Table ee{
        {"John"},
        {"Noah"},
        {"Henoch"},
    };

    Index hobby_idx{
        {"taxidermy", *ee.find("John")},
        {"taxidermy", *ee.find("Noah")},
        {"paleontelogy", *ee.find("Noah")},
    };
    Index skill_idx{
        {"mummify", *ee.find("Noah")},
        {"procrastinate", *ee.find("Henoch")},
    };

    for (auto [h, ref] : hobby_idx)
        std::cout << quoted(h) << "\texercised by\t" << quoted(ref.get().name) << "\n";
    for (auto [s, ref] : skill_idx)
        std::cout << quoted(s) << "\tmastered by\t" << quoted(ref.get().name) << "\n";
}

Printing

"paleontelogy"  exercised by    "Noah"
"taxidermy" exercised by    "John"
"taxidermy" exercised by    "Noah"
"mummify"   mastered by "Noah"
"procrastinate" mastered by "Henoch"

TL;DR Summary

Abstract your database to optimize for your usage patterns. If you many-to-many relations, this invariably means either de-normalizing data or using multiple relation tables in normal-form.

Of course you will give thought to guaranteeing the constraints, which is why you need to have a strict interface on your abstraction, so it cannot be used "wrong".

Note that you might well be able to write your bespoke query directly on a

struct Employ {
    std::string name;
    std::set<std::string_view> skills, hobbies;
};

And it might suffice for your needs.