Most efficient way to perform large dot/tensor dot products while only keeping diagonal entries

Question

Most efficient way to perform large dot/tensor dot products while only keeping diagonal entries

626 Views Asked by Minghan Chen At 18 August 2025 at 02:06

I'm trying to figure out a way to use numpy to perform the following algebra in the most time-efficient way possible:

Given a 3D matrix/tensor, A, with shape (n, m, p) and a 2D matrix/tensor, B, with shape (n, p), calculate C_ij = sum_over_k (A_ijk * B_ik), where the resulting matrix C would have dimension (n, m).

I've tried two ways to do this. One is to loop through the first dimension, and calculate a regular dot product each time. The other method is to use np.tensordot(A, B.T) to calculate a result with shape (n, m, n), and then take the diagonal elements along 1st and 3rd dimension. Both methods are shown below.

First method:

C = np.zeros((n,m))

for i in range(n):

  C[i] = np.dot(A[i], B[i])

Second method:

C = np.diagonal(np.tensordot(A, B.T, axes = 1), axis1=0, axis2=2).T

However, because n is a very large number, the loop over n in the first method is costing a lot of time. The second method calculates too many unnecessary entries to obtain that huge (n, m, n)matrix, and is also costing too much time, I'm wondering if there's any efficient way to do this?

Original Q&A

There are 2 best solutions below

one On 07 September 2019 at 03:06

here is my implementation:

B = np.expand_dims(B, axis=1)
E = A * B
E = np.sum(E, axis=-1)

Check :

import numpy as np
n, m, p = 2, 2, 2
np.random.seed(0)
A = np.random.randint(1, 10, (n, m, p))
B = np.random.randint(1, 10, (n, p))

C = np.diagonal(np.tensordot(A, B.T, axes = 1), axis1=0, axis2=2).T

# from here is my implementation
B = np.expand_dims(B, axis=1)
E = A * B
E = np.sum(E, axis=-1)

print(np.array_equal(C, E))

True

use the np.expand_dims() to add a new dimension. And use the broadcast multiply. Finally, sum along the third dimension.

Thanks check code from user3483203

**hpaulj** · Accepted Answer

Define 2 arrays:

In [168]: A = np.arange(2*3*4).reshape(2,3,4); B = np.arange(2*4).reshape(2,4)

Your iterative approach:

In [169]: [np.dot(a,b) for a,b in zip(A,B)]                                                                  
Out[169]: [array([14, 38, 62]), array([302, 390, 478])]

The einsum practically writes itself from your C_ij = sum_over_k (A_ijk * B_ik):

In [170]: np.einsum('ijk,ik->ij', A, B)                                                                      
Out[170]: 
array([[ 14,  38,  62],
       [302, 390, 478]])

@, matmul, was added to perform batch dot products; here the i dimension is the batch one. Since it uses the last of A and 2nd to the last of B for the dot summation, we have to temporarily expand B to (2,4,1):

In [171]: A@B[...,None]                                                                                      
Out[171]: 
array([[[ 14],
        [ 38],
        [ 62]],

       [[302],
        [390],
        [478]]])
In [172]: (A@B[...,None])[...,0]                                                                             
Out[172]: 
array([[ 14,  38,  62],
       [302, 390, 478]])

Typically matmul is fastest, since it passes the task to BLAS like code.

Most efficient way to perform large dot/tensor dot products while only keeping diagonal entries

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in NUMPY

Related Questions in LINEAR-ALGEBRA

Related Questions in TENSORDOT

Trending Questions

Popular # Hahtags

Popular Questions