The function np.dot multiplies the GF4 field matrices for a very long time

186 Views Asked by At

Multiplies large matrices for a very long time. How can this problem be solved. I use the galois library, and numpy, I think it should still work stably. I tried to implement my GF4 arithmetic and multiplied matrices using numpy, but it takes even longer. Thank you for your reply.

When r = 2,3,4,5,6 multiplies quickly, then it takes a long time. As for me, these are not very large sizes of matrices. This is just a code snippet. I get the sizes n, k of matrices of a certain family given r. And I need to multiply the matrices of those obtained parameters.

import numpy as np
import galois


def family_Hamming(q,r):
    n = int((q**r-1)/(q-1))
    k = int((q**r-1)/(q-1)-r)
    res = (n,k)
    return res

q = 4
r = 7

n,k = family_Hamming(q,r)

GF = galois.GF(2**2)

#(5461,5461)
a = GF(np.random.randint(4, size=(k, k)))
#(5454,5461)
b = GF(np.random.randint(4, size=(k, n)))
c = np.dot(a,b)
print(c)
3

There are 3 best solutions below

2
On

I'm not sure if it is actually faster but np.dot should be used for the dot product of two vectors, for matrix multiplication use A @ B. That's as efficient as you can get with Python as far as I know

2
On

Try using jax on a CUDA runtime. For example, you can try it out on Google Colab's free GPU. (Open a notebook -> Runtime -> Change runtime type -> GPU).

import jax.numpy as jnp
from jax import device_put

a = GF(np.random.randint(4, size=(k, k)))
b = GF(np.random.randint(4, size=(k, n)))

a, b = device_put(a), device_put(b)
c = jnp.dot(a, b)

c = np.asarray(c)

Timing test:

%timeit jnp.dot(a, b).block_until_ready()
# 765 ms ± 96.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
0
On

I'm the author of galois. I added performance improvements to matrix multiplication in v0.3.0 by parallelizing the arithmetic over multiple cores. The next performance improvement will come once GPU support is added.

I'm open to other performance improvement suggestions, but as far as I know the algorithm is running as fast as possible on a CPU.

In [1]: import galois

In [2]: GF = galois.GF(2**2)

In [3]: A = GF.Random((300, 400), seed=1)

In [4]: B = GF.Random((400, 500), seed=2)

# v0.2.0
In [5]: %timeit A @ B
1.02 s ± 7.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# v0.3.0
In [5]: %timeit A @ B
99 ms ± 1.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)