Efficient node extract() + insert() in `unordered_map`

512 Views Asked by At

Using the C++17 node-extract() functions, I can change the key without having to re-allocate the node. In my particular use-case, I'm replacing the key with an equal one, so I'd like to use insert()-with-hint to avoid a full second lookup, and that works fine when using std::map:

struct Replacement { std::string from, to; };
std::map<std::string_view, Replacement> replacements;
~~~~
std::string from = ~~~;
std::string to = ~~~:
auto [it, inserted] = replacements.try_emplace(from, std::move(from), std::move(to));
if (inserted) {
    // need to patch up the key, which points to invalid data now:
    auto node = replacements.extract(it++);    // 1
    node.key() = node.mapped().from;           // 2
    replacements.insert(it, std::move(node));  // 3
}

As I insert() a new node, its key_type will continue to point to from.data(), which, however, gets moved from as part of the construction of the mapped_type in the emplacement. Because of SSO (small-string optimisation), the moved string's data() may differ from the source string's data(), invalidating the key_type as soon as from's lifetime ends, possibly sooner. So we need to re-set the key_type to point into the mapped_type, for which we (1) extract the node, making sure to keep it valid by incrementing it before the extract(), then (2) patching up the key as required and finally (3) re-inserting the node at the old position, given by the hint it.

So far so good. Now try the same with unordered_map:

~~~~
std::string from = ~~~;
std::string to = ~~~:
auto [it, inserted] = replacements.try_emplace(from, std::move(from), std::move(to));
if (inserted) {
    // need to patch up the key, which points to invalid data now:
    auto node = replacements.extract(it++);    // only thing possible, but wrong direction
    // auto node = replacements.extract(it--); // right direction, but ERROR: unordered_map isn't bidirectional
    node.key() = node.mapped().from;           // 2
    replacements.insert(it, std::move(node));  // 3
}

Here, I run into the problem that unordered_map, being a vector<forward_list>, only has forward iterators, so I cannot, as I would have to, go backwards to remember the exact insertion point for the final insert-with-hint (3) in iteration order. It seems I have but two choices:

  1. use insert(node) without a hint, causing a full second lookup
  2. increment the iterator and pass that as the hint. The std says that the insertion is performed as close as possible to the hint, but with the intra-bucket iterators, at least, being forward-only, if my hash table is as sparse as is required for good performace, the incremented iterator may be several buckets down the bucket list, making the hint useless.

So, ok, the hint is useless for unordered_map</rubberducking>. To save the question, let's talk about unordered_multimap, then :)

In unordered_map, the hint could be useful, as, unlike unordered_map, unordered_multimap needs to scan the bucket to put the new node into a potential equal_range() of the key. Is there a better way to produce a hint than post-increment or not-at-all?

1

There are 1 best solutions below

1
On

If your only aim is value-preserving key adjustments, you needn’t extract anything: just wrap your key in a class that has it as a mutable member (or holds it via a std::unique_ptr) so that you can legitimately change it in the map. You’ll have to define comparison/hashing operations for the wrapper type, but that’s no more code than the iterator trickery.