Just wondering if there is some other way than this.
var hashStringArray = function(array) {
array.sort();
return array.join('|');
};
I don't like sorting much and using that delimiter is not safe either if it's contained in one of the strings. In overall I need to produce same hash no matter the order of strings. It will be rather short arrays (up to 10 items), but it will be required very often so it shouldn't be too slow.
I intend to use it with ES6 Map object and I need to easily find same array collection.
Updated example of use
var theMap = new Map();
var lookup = function(arr) {
var item = null;
var hashed = hashStringArray(arr);
if (item = theMap.get( hashed )) {
return item;
}
theMap.set( hashed, itemBasedOnInput );
return itemBasedOnInput;
}
var arr1 = ['alpha','beta','gama'];
var arr2 = ['beta','alpha','gama'];
lookup(arr1) === lookup(arr2)
Two things occurred to me as the basis of a solution:
summing doesn't depend on order, which is actually a flaw in simple checksums (they don't catch changes in block order within a word), and
we can convert strings to summable numbers using their charcodes
Here's a function to do (2) :
Here's a version of (1) that computes an array hash by summing the charsum values:
Fiddle here: http://jsfiddle.net/WS9dC/11/
If we did a straight sum of the charsum values, then the array ["a", "d"] would have the same hash as the array ["b", "c"] - leading to undesired collisions. So based on using non-UTF strings, where charcodes go up to 255, and allowing for 255 characters in each string, then the max return value of charsum is 255 * 255 = 65025. So I picked the next prime number up, 65027, and used (65027 / cs) to compute the hash. I am not 100% convinced this removes collisions... perhaps more thought needed... but it certainly fixes the [a, d] versus [b, c] case. Testing:
Outputs:
And testing a case that shows different hashes:
outputs:
Edit:
Here's a revised version, which ignore duplicates from the arrays as it goes, and return the hash based on unique items only:
http://jsfiddle.net/WS9dC/7/
testing:
returns:
Edit
I've revised the answer above to account for arrays of words that have the same letters. We need these to return different hashes, which they now do:
The fix was to add a multiplier to the charsum func based on the char index: