What hash function is better?

Question

What hash function is better?

664 Views Asked by gizzy At 27 July 2025 at 17:03

I write my implementation of HashMap in Java. I use open addressing for collision resolution. For better key distribution I want use a nice hash function for int hashcode of key. I dont know what hash function is better for it?

public int getIndex(K key) { return hash(key.hashCode()) % capacity; }

I need a hash function for hashcode of key.

Original Q&A

There are 2 best solutions below

Borealid On 04 February 2012 at 07:07

Any hash that distributes the values you're expecting to receive evenly is a good hash function.

Your goal is to maximize performance (well, maximize performance while maintaining correctness). The primary concern there is to minimize bucket collisions. This means that the ideal hash is tailored to your input data - if you know what you'll receive, you can choose the hash the produces a minimal number of collisions and maybe even a cache-optimal access pattern.

However, that's not usually a realistic option, so you just choose a hash whose output is unbiased and unpredictable (one that behaves like a pseudorandom number generator, but deterministic). Some such functions are the "murmur" hash family.

**Peter Lawrey** · Accepted Answer

The main problem with using % capacity is that it can return negative and positive values.

HashMap avoids this issue by using a power of 2 and uses the following approach

 public int getIndex(K key) { return hash(key.hashCode()) & (capacity-1); }

If the capacity is not a power of 2, you can ignore the high bit (which is often no so random)

 public int getIndex(K key) { return (hash(key.hashCode()) & 0x7FFFFFFF) % capacity; }

The hash function actually used can matter. HashMap uses the following

/**
 * Applies a supplemental hash function to a given hashCode, which
 * defends against poor quality hash functions.  This is critical
 * because HashMap uses power-of-two length hash tables, that
 * otherwise encounter collisions for hashCodes that do not differ
 * in lower bits. Note: Null keys always map to hash 0, thus index 0.
 */
static int hash(int h) {
    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

I would use this, unless you have a good reason not to. E.g. for security reasons, if you have a service which could the subject of a denial of service attack, you will want to use a different hash to avoid a malicious user turning your HashMap into a LinkedList. Unfortunately you still have to use a different hashCode() as well as you can create a long list of Strings with the underlying hash code so mutating it later is too later.

Here is a list of strings with all have a hashCode() of 0, there is nothing a hash() function can do about that.

Why doesn't String's hashCode() cache 0?

What hash function is better?

There are 2 best solutions below

Related Questions in JAVA

Related Questions in HASHMAP

Related Questions in HASHTABLE

Related Questions in HASHCODE

Related Questions in HASH-COLLISION

Trending Questions

Popular # Hahtags

Popular Questions