Hash Function for a 7 digits int

440 Views Asked by At

I'm new to hash tables and functions, so I apologize in advance if I got anything wrong.

I'm trying to create a hash table in C++ for a list of about 100k entries comprised of a 7 digit number. The thing is, I got stuck while trying to figure out what hash function to use.

When using %100000 I got ~65k unique keys, while there are ~90k unique entries. Which means that about 1/3 of the data will have collisions.

Is this a good hash function to use? Or is there a better function to use in that case in order to have less collisions? What size should my table be?

Thanks again!

Edit- The entries are numbers between 1 and 2 mil. Is it possible to use the number itself as the key? Or should keys for hash table always start at 0?

1

There are 1 best solutions below

0
Aconcagua On BEST ANSWER

The standard library comes with two types internally using a hash table: std::unordered_map and std::unordered_set.

As your key type is an integral type you get a hash table pretty conveniently by std::unordered_map<YourIdType, YourDataType. You can easily access the data via theMap[someId], but be aware that if the key is not found a new object is created! If that's not desired you'd rather use theMap.find(someId), which returns an iterator.

The drawback, though, is that you store the id twice then (internally as a std::pair<YourIdType, YourDataType>). You can avoid that by using a std::unordered_set. To be able to do so, though, you need to specialise std::hash and std::equal_to for your type:

namespace std // you are not allowed to add to – with exception of specialisations
{
template<>
struct hash<YourDataType>
{
    size_t operator()(YourDataType const& object) const
    {
        return hash<YourIdType>()(object.id);
    }
};

// analogously equal_to with appropriate comparisons, possibly by
// comparing the object's addresses

Alternatively you can provide a custom hasher type (with C++20 that can even be a lambda packed into decltype) to the set as second template parameter and just implement operator== for your object type, or provide a custom equality comparer type if you need it to compare differently than the operator, maybe like:

// C++20 required:
using YourMapType = std::set
<
    YourDataType,
    decltype
    (
        [](YourDataType const& object)
        { return std::hash<YourIdType>()(object.id); }
    ),
    decltype
    (
        [](YourDataType const& o1, YourDataType const& o2)
        { return &o1 == &o2; } // TODO: comparisons as you need! 
    )
>;
// alternatively create custom types with appropriate operator() implementations

Drawback here is – apart from additional complexity for the specialisations – that you cannot lookup objects by the id only, instead you need a complete object of your data type.

So which one is more appropriate/suitable? That depends on your concrete requirements...