Does a perfect hash function guarantee no collisions?

3.5k Views Asked by At

I have been reading and learning hashing and hashtables and experemented with some code(I am still very new to this so I might say something wrong that I missunderstood). I came to the issue for perfect hash functions. Provided that I have my own custom type that somehow has a perfect hash function:

class Foo
{
    private int data;

    override int GetHashCode()
    {
        return data.GetHashCode();
    }
}

An int's hash code is the int itself so I have a perfect hash function, right? But when we use the hashing function to map the objects to a hashtable by the simple formula:

index = foo.GetHashCode() % hashtable.Length

we get a variable index that depends on also how many elements we have in the hashtable. If the hashtable's size was int.MaxValue only then we will have a perfect hash function. For example lets say that we have a hashtable with size of 2. And if we hash for example the numbers 1 and 3 we get

1 % 2 = 1
3 % 2 = 1

A collision! Have I understood anything wrong about hashing and hashtables? It comes out that a perfect hash function is not perfect.

3

There are 3 best solutions below

0
On
  1. Yes! (as said, by definition)

  2. Where do you get a p.h.f from in the first place? You want to hash a fixed, i.e. constant set S of different (i.e. no multiset) values to the set 1..|S|, bijectively. Apparently then, the p.h.f depends on the set S.

  3. Also, remove a single element from S, and add another one, you almost surely get a collision (of the new element with an old one).

  4. So, you actually want "a p.h.f. for such-and-such well defined/described set". And then we can try to find one.

2
On

Yes, a perfect hash function is guaranteed not to have collisions.

That's its very definition!

From Wikipedia (http://en.wikipedia.org/wiki/Perfect_hash_function)

A perfect hash function for a set S is a hash function that maps distinct elements in S to a set of integers, with no collisions. A perfect hash function has many of the same applications as other hash functions, but with the advantage that no collision resolution has to be implemented

4
On

You have it all right until this point

index = foo.GetHashCode() % hashtable.Length

Your hash function is perfect, but when you calculate the modulo, you're actually using a different hash function. In this case, your hash function int.GetHashCode is perfect, but your data structure using foo.GetHashCode() % hashtable.Length is not. That is, one thing is the hash of your objects, and a different thing is the hash used by the structure holding those objects.

For your data structure to be perfect too, its maximum size must also be the number of ints.

So why don't we have collisions in Dictionary? Actually, we do. If two objects A and B do have the same hash in the dictionary, we have a collision. What happens is that the dictionary runs A.Equals(B) as the final check to see if the two objects actually are the same or not. If they are, you get an exception for having duplicates. If they don't, they are both kept under the same dictionary hash.