I am currently studying for my exams and have came up against this question:
(5d) Suppose we are using linear hashing, and start with an empty table with 2 buckets (M = 2), split = 0 and a load factor of 0.9. Explain the steps we go through when the following hashes are added (in order):
5,7,12,11,9
The answer provided for this is:
*— —5— (0,1)
* — —5,7 —
- split —*—5,7— — (0,1,2)
—12*—5,7— — —
- split —12—5—*—7— (0,1,2,3)
- split =M, M = 2*M, split = 0
*—12—5— —7—
*—12—5— —7,11—
- split —*—5— —7,11—12— (0,1,2,3,4)
—*—5,9— —7,11—12—
- split — —9*— —7,11—12—5— (0,1,2,3,4,5)
This answer doesn't make any sense to me and the lecturer did not go through this.
How do I tackle this question?
I edited your question because the answer looks like a list of descriptions of the hash table state as each operation is performed. Did your professor cover linear hashing at all? The Wikipedia description mention a load factor precisely, but it's in the original LH paper by Witold Litwin. it's integral to when a controlled split occurs. I also found these descriptions:
Linear Hashing by Zhang, et al (PDF)
Linear Hashing: A new Tool for File and Table Addressing Witold Litwin, Summary by: Steve Gribble and Armando Fox, Online Berkley.edu retrieved June 16
So basically, a load factor is a means of predictably controlling when a split will occur. One implementation of linear hashing appears to be called 'uncontrolled split' which adds a new bucket and performs a split whenever a collision occurs. Using a load factor of 0.9 only has a split occur when 90% of the tables buckets are full - or rather, would be full based on the prediction that the buckets are uniformly assigned to.
Based on this and the Wikipedia article I just read, the setup is this:
- -
(numbered 0 and 1)This follows the wikipedia article description I linked to above. Now we need to cover the hash and bucket assignment.
mod
(N x 2L), which is really just Imod
(N x 2L). I'm going to call this B(I) below for brevity (also for bucket). Call this the assignment address A.mod
(N x 2L + 1), giving us an actual assignment address of A'.I think the * in the answer's notation above denotes the location of the split pointer S. In my notation for the rest of the question below:
-
denote an empty bucket,i
denote a bucket with the Integeri
in it, andi,j
denote a bucket with bothi
andj
in it.- 5
for clarity.I'm thinking your answer breakdown reads like this:
mod
(2 x 20) = 5mod
(2 x 1) = 5mod
2 = 1- 5
(0th bucket empty, 1st bucket with 5 in it.Add 7 to the table.
mod
2 = 1, so 7 is added to the same bucket as 5. S still hasn't changed, so again 1 is used as the address.- 5,7
- 5,7 -
, with buckets 0 and 2 empty, and 1 still with 5 and 7 in it.Add 12 to the table.
mod
(2 x 20) = 12mod
2 = 0mod
(2 x 20 + 1) = 12mod
4, which is still 0, so 12 is added to the 0th bucket.12 5,7 -
, only the 3rd, new bucket is empty.12 5,7 - -
5,7
is split. That means new buckets are picked for 5 and 7.mod
2 = 1, 1 < S, calculate 5mod
2 x 21 = 5mod
4 = 1. 5 is re-assigned to its same bucket.mod
2 = 1, 1 < S, calculate 7mod
2 x 21 = 7mod
4 = 3. 7 is re-assigned to 3.12 5 - 7
Add 11 to the table.
mod
(2 x 21) = 11mod
4 = 3. 11 is assigned to the 3rd bucket.12 5 - 7,11
, 4 items and 4 buckets, so a split occurs again.mod
(2 x 21) = 12mod
4 = 0. 0 < 1, so recalculatemod
(2 x 21+1) = 12mod
8 = 4. 12 is assigned to the 4th bucket.- 5 - 7,11 12
Add 9 to the table.
I'll leave the steps to the last one for you. There are a few nuances to the LH algorithm that I'm not quite grasping. I might ask additional questions about them. But hopefully that's enough for you to get going on. In the future, I would recommend asking the course instructor directly.