I have to solve a XOR operation on very high dimensional (~30'000) vectors to compute the Hamming distance. For example, I need to compute the XOR operation between one vector full of False with 16 sparsily located True with each row of a 50'000x30'000 matrix.
As of now, the quickest way I found is to not use scipy.sparse but the simple ^ operation on each row.
This:
l1distances=(self.hashes[index,:]^self.hashes[all_points,:]).sum(axis=1)
Happens to be ten times faster than this:
sparse_hashes = scipy.sparse.csr_matrix((self.hashes)).astype('bool')
for i in range(all_points.shape[0]):
l1distances[0,i]=(sparse_hashes[index]-sparse_hashes[all_points[i]]).sum()
But ten times faster is still quite slow since, theoretically, having a sparse vector with 16 activations should make the computation the same as having a 16 dimension one.
Is there any solution? I'm really struggling here, thanks for the help!
If your vector is highly sparse (like 16/30000) I'd probably just skip fiddling with sparse xor entirely.