I am trying to match the sha1 values of certain data in the table in SQL Server and Snowflake.
I've got the sha1 for a latin character in SQL server in the following way-
select sys.fn_varbintohexsubstring(0, HASHBYTES('SHA1',cast('á' as varchar(1))),1,0)
This returns b753d636f6ee46bb9242d01ff8b61f715e9a88c3
The sha1 function in Snowflake returns a different value for the same character.
select sha1(cast('á' as varchar))
Result - 2b9cc8d86a48fd3e4e76e117b1bd08884ec9691d
Note - The datatype in SQL Server is nvarchar while the datatype in Snowflake is varchar with default collation. For english characters, the sha1 values match after casting nvarchar to varchar. However, this is not the case with latin characters.
Is there a way to match sha1 values for non-english characters ? I need to get the value '2b9cc8d86a48fd3e4e76e117b1bd08884ec9691d' in SQL Server 2017 & below as it is what other databases like Oracle, Snowflake and Hive return.
Thanks
TL;DR: Never use
varcharwhen calculating hashes. There are simply too many rakes you can step on in the process.Just as an example, I adapted your code for easier understanding and run it in the context of a database which has
Latin1_General_100_CI_ASdefault collation:The result is:
However, if I change the database context to another DB, with the
Cyrillic_General_100_CI_AScollation, the same code suddenly returns different values:As you can see, the [Char] in the first line is a different character now (small Latin "а"). This kind of implicit codepage adjustment cannot be prevented unless your data is in Unicode, or in a binary form.
Your options