Dot product between two 3D tensors

Question

Dot product between two 3D tensors

2.4k Views Asked by gorjan At 18 August 2025 at 05:10

I have two 3D tensors, tensor A which has shape [B,N,S] and tensor B which also has shape [B,N,S]. What I want to get is a third tensor C, which I expect to have [B,B,N] shape, where the element C[i,j,k] = np.dot(A[i,k,:], B[j,k,:]. I also want to achieve this is a vectorized way.

Some further info: The two tensors A and B have shape [Batch_size, Num_vectors, Vector_size]. The tensor C, is supposed to represent the dot product between each element in the batch from A and each element in the batch from B, between all of the different vectors.

Hope that it is clear enough and looking forward to you answers!

Original Q&A

There are 3 best solutions below

Learning is a mess On 26 June 2019 at 13:44

Try:

C = np.diagonal( np.tensordot(A,B, axes=(2,2)), axis1=1, axis2=3)

from https://docs.scipy.org/doc/numpy/reference/generated/numpy.tensordot.html#numpy.tensordot

Explanation

The solution is a composition of two operations. First the tensor product between A and B over their third axis as you want it. This outputs a rank-4 tensor, that you want to reduce to a rank-3 tensor by taking equal indices on axis 1 and 3 (your k in your notation, note that tensordot gives a different axis order than your maths). This can be done by taking the diagonal, as you can do when reducing a matrix to the vector of its diagonal entries.

Ben.T On 26 June 2019 at 13:53

I think you can use einsum such as:

np.einsum( 'ikm, jkm-> ijk', A, B)

with the subscripts 'ikm, jkm-> ijk', you can specify which dimension are reduced with the Einstein convention. The third dimension of both arrays A and B here named 'm' will be reduced as the dot operation does on vectors.

**hpaulj** · Accepted Answer

In [331]: A=np.random.rand(100,200,300)                                                              
In [332]: B=A

The suggested einsum, working directly from the

C[i,j,k] = np.dot(A[i,k,:], B[j,k,:]

expression:

In [333]: np.einsum( 'ikm, jkm-> ijk', A, B).shape                                                   
Out[333]: (100, 100, 200)
In [334]: timeit np.einsum( 'ikm, jkm-> ijk', A, B).shape                                            
800 ms ± 25.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

matmul does a dot on the last 2 dimensions, and treats the leading one(s) as batch. In your case 'k' is the batch dimension, and 'm' is the one that should obey the last A and 2nd to the last of B rule. So rewriting the ikm,jkm... to fit, and transposing A and B accordingly:

In [335]: np.einsum('kim,kmj->kij', A.transpose(1,0,2), B.transpose(1,2,0)).shape                     
Out[335]: (200, 100, 100)
In [336]: timeit np.einsum('kim,kmj->kij',A.transpose(1,0,2), B.transpose(1,2,0)).shape              
774 ms ± 22.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Not much difference in performance. But now use matmul:

In [337]: (A.transpose(1,0,2)@B.transpose(1,2,0)).transpose(1,2,0).shape                             
Out[337]: (100, 100, 200)
In [338]: timeit (A.transpose(1,0,2)@B.transpose(1,2,0)).transpose(1,2,0).shape                      
64.4 ms ± 1.17 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

and verify that values match (though more often than not, if shapes match, values do to).

In [339]: np.allclose((A.transpose(1,0,2)@B.transpose(1,2,0)).transpose(1,2,0),np.einsum( 'ikm, jkm->
     ...:  ijk', A, B))                                                                              
Out[339]: True

I won't try to measure memory usage, but the time improvement suggests it too is better.

In some cases einsum is optimized to use matmul. Here that doesn't seem to be the case, though we could play with its parameters. I'm a little surprised the matmul is doing so much better.

===

I vaguely recall another SO about matmul taking a short cut when the two arrays are the same thing, A@A. I used B=A in these tests.

In [350]: timeit (A.transpose(1,0,2)@B.transpose(1,2,0)).transpose(1,2,0).shape                      
60.6 ms ± 1.17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [352]: B2=np.random.rand(100,200,300)                                                             
In [353]: timeit (A.transpose(1,0,2)@B2.transpose(1,2,0)).transpose(1,2,0).shape                     
97.4 ms ± 164 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

But that only made a modest difference.

In [356]: np.__version__                                                                             
Out[356]: '1.16.4'

My BLAS etc is standard Linux, nothing special.

Dot product between two 3D tensors

There are 3 best solutions below

Related Questions in PYTHON

Related Questions in NUMPY

Related Questions in TENSORFLOW

Related Questions in TENSORDOT

Trending Questions

Popular # Hahtags

Popular Questions