Creating a hash key to serve as unique key

447 Views Asked by At

I have a very large table that has duplicate name and address information. This table feeds a process which performs a task and appends results back to the table. I'd like to reduce the volume of what's fed into this process by creating a hash key on the name and address information. That way I can feed one record per hash key, reducing my input by 75%. And I need this key to be persistent over time.

However, since this hash key would serve as the key on which I'd join my results table, I need it to be unique. I can create the hash key as a persisted column, and give it a unique constraint, but I'm concerned about the admittedly tiny chance of collision. If two different name and address strings could possibly produce the same hash output, I still need a unique key for both of them.

Even if it's incredibly unlikely that this would ever happen, I'm not happy knowing that if it did, I wouldn't have a plan.

I've also considered using the table's surrogate ID, and assigning every record within a group of Names & Addresses with the MIN(surrogateID). However, if the record corresponding to the MIN(surrogateID) for a given group was deleted, now my ID has changed.

I could create a lookup table of distinct names and addresses, and give each a simple integer ID. But I'd prefer to avoid the storage cost.

Are there any other options I might not be considering?

0

There are 0 best solutions below