I am trying to generate a hash based on multiple columns in a table. The columns could be of different types.
I currently have some complex code that one needs to remember to update anytime another column is added. In addition the coalesce ends up adding to the complexity, plus the need to cast non-string columns.
, murmur3(to_utf8(concat_ws('::',coalesce(value1,'-'),coalesce(value2,'-'),coalesce(cast(value3 as varchar))))) as hash1
, xxhash64(to_utf8(concat_ws('::',coalesce(value1,'-'),coalesce(value2,'-'),coalesce(cast(value3 as varchar))))) as hash2
The valuex fields could be of any type. In my example above, they are string, string, int. The coalesce is needed to make sure null values are handled correctly.
Is there a quicker/easier way to generate a hash from multiple fields without the use of coalesce and/or concat_ws?
I will be using the computed hash to determine if the record changed and only update the record if the record changed using a merge command.
merge into .....
WHEN MATCHED and src.hash <> tgt.hash
THEN UPDATE SET ....
I need this to work with Athena Engine 3.0