MurmurHash3 - Java and Python return different results on long input

980 Views Asked by At

I'm using a Java version of MurMurHash3 developed by Google (google.common.hash.HashFunction and google.common.hash.Hashing) to create n independent hash functions (using n different seeds) to hash an ID as long. Here a snippet of the code:

    for(int i=0; i<seeds.length;i++){
        signature[i] =  hash(id, seeds[i]);
    }

    private long hash(int id, int seed){
        HashFunction hf = Hashing.murmur3_128(seed);
        long signature = hf.hashLong((long)id).asLong();

I've tried to replicate the above code in Python 2.7 using mmh3 (https://pypi.org/project/mmh3/) but the Python version accept only strings as input (or NumPy int) and using the same seed return a different result. Here a snippet of the code:

def create_signature(self, id):
    v = np.int64(id)
    signature = []
    for i in range(len(self.__seeds)):
        h = mmh3.hash128(v, self.__seeds[i], signed=True)
        signature.append(h)
    return signature

Applying mmh3 library on a set of different IDs, there are also lots of collisions (no collisions with the Java version instead). Is there a way to get the same results of Java version with Python?

0

There are 0 best solutions below