I have 2 arrays, one is of size:
A = np.random.uniform(size=(48, 1000000, 2))
and the other is
B = np.random.uniform(size=(48))
I want to do the following summation:
np.einsum("i, ijk -> jk", B, A)
as fast as possible.
The summation would need to be done tens of millions of times, so speed matters. At each iteration, B
would change and A
would stay the same. I have tried 2 options (see below) but both of them are comparable. Is there a way to speed it up?
Code:
import numpy as np
def computation_einsum(x,y):
z = np.einsum("i, ijk -> jk", y, x)
return z
def computation_dot(x,y):
z = y @ x
return z
A = np.random.uniform(size=(48, 1000000, 2))
B = np.random.uniform(size=(48))
C = A.transpose(1,0,2)
Timings:
%timeit -n 10 computation_einsum(A, B)
100 ms ± 238 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit -n 10 computation_dot(C, B)
107 ms ± 2.11 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
I know that for larger matrices there are options which would make a difference, but I have these specific shapes and sizes.
We can improve
matmul
by giving it arrays that it can pass 'intact' to the compiled code. A transpose reorders the shape and strides, but does not change the underlying data.copy
takes time, but may be worth it if you are reusingA
many times.Alternate transpose, putting
48
last:timings:
I'm a little surprised that this
einsum
is faster, since I thought it pass the calculation on to matmul in simple cases:And for reference, on my machine:
It's worth playing with the optimize parameter. The effect seems to vary with the numpy version, so I can't predict when it will help, but here:
It doesn't help in the
E
case: