I'm using a Java version of MurMurHash3 developed by Google (google.common.hash.HashFunction and google.common.hash.Hashing) to create n independent hash functions (using n different seeds) to hash an ID as long. Here a snippet of the code:
for(int i=0; i<seeds.length;i++){
signature[i] = hash(id, seeds[i]);
}
private long hash(int id, int seed){
HashFunction hf = Hashing.murmur3_128(seed);
long signature = hf.hashLong((long)id).asLong();
I've tried to replicate the above code in Python 2.7 using mmh3 (https://pypi.org/project/mmh3/) but the Python version accept only strings as input (or NumPy int) and using the same seed return a different result. Here a snippet of the code:
def create_signature(self, id):
v = np.int64(id)
signature = []
for i in range(len(self.__seeds)):
h = mmh3.hash128(v, self.__seeds[i], signed=True)
signature.append(h)
return signature
Applying mmh3 library on a set of different IDs, there are also lots of collisions (no collisions with the Java version instead). Is there a way to get the same results of Java version with Python?