Properties of a good hash function from MIT recitation:
- satisfies (approximately) the assumption of simple uniform hashing: each key is equally likely to hash to any of the m slots.
- The hash function shouldn’t bias towards particular slots does not hash similar keys to the same slot (e.g. compiler’s symbol table shouldn’t hash variables i and j to the same slot since they are used in conjunction a lot)
- is quick to calculate, should have O(1) runtime
- is deterministic. h(k) should always return the same value for a given k
Can some one explain the point 2 further . What does variable name has to do with hash functions ?
Edit: I work with Java . So if an answer contains explanation using java semantics it is okay to me .
Since hash tables are often used for building lookup tables that compilers use to look up information about symbols, such as variable names and function names, a compiler scenario is used to explain the point of #2.
The authors took a pair of variable names that are very commonly found in the same program,
i
andj
, and said that strings representing the names of these variables,"i"
and"j"
, should not be hashed to the same slot. This makes sense, because resolving hash collisions is the part of the lookup process that most negatively influences the speed.