I'm learning about LSH and minhashing and I'm trying to understand the rational of hashing the signature matrix:
We divide the signature matrix to bands and we hash (using which hash function?) every portion of column to k
buckets. Why would it make sense? If we use a regular hash function then even a slight difference in two columns would probably lead to different buckets.
I do understand the relation between the signature matrix to Jacard distance but I don't understand the next step which is essentially hashing that distributes items evenly.