I am trying to understand how universal hashing works. It is defined h(x) = [(a*x + b) mod p] mod m
where a,b
- random numbers, m
- size of hash table, x
- key, and p
- prime number. For example, I have several different keys:
92333
23347
20313
And in order to create a universal hash function I have to to following:
Let a = 10, b = 22, p = 313, m = 100
h(92333) = [(10 * 92333 + 22) mod 313] mod 100 = 2 mod 100 = 2
h(23347) = [(10 * 23347 + 22) mod 313] mod 100 = 307 mod 100 = 7
...
But probably every time I will get number in range from 0 to 99 and there might be lots of collisions.
So my question is: I understood and applied universal hashing correctly?
Assuming that the numbers you're hashing have a uniform distribution, your function is biased toward buckets 0 through 12.
Assume that the hash operation up to and including the
mod 313
operation occurs. The result of that operation gets you a value in the range 0..312. Again, if the result of this operation is even distributed, then take themod 100
you get the following effect:Notice how the number of opportunities to get a particular result drop after 12? There's your bias. Here's more evidence of that effect taken from counting the results of hashing the numbers 0 through 5,000,000: