I'm trying to debug a caching issue in our codebase, where we use std::unordered_map
to hash a struct custom_key
which is defined below, but I believe it should only be POD datatypes so I think whatever the C++ library implemented for the hash function will be used. I am wondering if this function is deterministic?
struct custom_key {
// bunch of uint8_t variables
// some int8_t variables
// some boolean variables
another_struct params;
}
struct another_struct {
// an enum variable
// some int variables
int dimensions[5];
}
Our caching algorithm is
// some_input_key is coming in from a stream
std::unordered_map<custom_key, some_output_type, custom_hasher<custom_key>, custom_equal<custom_key>> ht;
if (ht.find(some_input_key) != ht.end()) {
// do something
} else {
ht[some_input_key] = ....
}
In our input stream, we have a set of custom_keys
where there may be identical values, but we're unable to find some subsequent identical keys in the hash table.
e.g., our stream would be key1, key2, key3, key4, ....
and suppose key1
and key4
are identical, sometimes it will not be able to find key4
in the hash table. I'm pretty sure the hashing function is deterministic, so I'm perplexed what could be causing the issue. I've printed the contents of both keys and they look identical.
Edit: it appears that we do have a custom hash function
template <typename T>
// based on Fowler–Noll–Vo hash function
struct custom_hasher {
static_assert(std::is_pod<T>::value, "T is not POD");
size_t operator()(const T& input) const {
auto ptr = reinterpret_cast<const uint8_t*>(&input);
uint32_t value = 0x811C9DC5;
// we have an implementation for irange that does something similar to the one in python
for (const auto i : irange((int)sizeof(T))) {
value ^= ptr[i];
value *= 0x01000193;
}
return (size_t)value;
}
};
template <typename T>
struct custom_equal {
static_assert(std::is_pod<T>::value, "T is not POD");
bool operator()(const T& lhs, const T& rhs) const {
auto ptr1 = reinterpret_cast<const uint8_t*>(&lhs);
auto ptr2 = reinterpret_cast<const uint8_t*>(&rhs);
return memcmp(ptr1, ptr2, sizeof(T)) == 0;
}
};
cppreference.com:
std::unordered_map
will use hash on its keys. Objects with the same HASH go into the same buckets. However, if two things have the same VALUE, then the second pair is ignored. If key1 and key4 are identical, then key4 and its value will be discarded.What you want to use is
std::unordered_multimap
(see here): cppreference.com:EDIT: As the others are saying, padding might be messing with your hash result. An alternative is to individually hash each member of the struct and then combine the hashes.