I'm trying to figure out a way to use numpy to perform the following algebra in the most time-efficient way possible:
Given a 3D matrix/tensor, A
, with shape (n, m, p)
and a 2D matrix/tensor, B
, with shape (n, p)
, calculate C_ij = sum_over_k (A_ijk * B_ik)
, where the resulting matrix C
would have dimension (n, m).
I've tried two ways to do this. One is to loop through the first dimension, and calculate a regular dot product each time.
The other method is to use np.tensordot(A, B.T)
to calculate a result with shape (n, m, n)
, and then take the diagonal elements along 1st and 3rd dimension. Both methods are shown below.
First method:
C = np.zeros((n,m))
for i in range(n):
C[i] = np.dot(A[i], B[i])
Second method:
C = np.diagonal(np.tensordot(A, B.T, axes = 1), axis1=0, axis2=2).T
However, because n is a very large number, the loop over n in the first method is costing a lot of time. The second method calculates too many unnecessary entries to obtain that huge (n, m, n)
matrix, and is also costing too much time, I'm wondering if there's any efficient way to do this?
Define 2 arrays:
Your iterative approach:
The
einsum
practically writes itself from yourC_ij = sum_over_k (A_ijk * B_ik)
:@
,matmul
, was added to perform batchdot
products; here thei
dimension is the batch one. Since it uses the last ofA
and 2nd to the last ofB
for thedot
summation, we have to temporarily expandB
to(2,4,1)
:Typically
matmul
is fastest, since it passes the task to BLAS like code.