Understand the details of einsum application for two tensors

74 Views Asked by At

We have two tensors:

 a = np.arange(8.).reshape(4,2,1)
 b = np.arange(16.).reshape(2,4,2)

We are going to implement

np.einsum('ijk,jil->kl', a, b)

Although we could obtain its results, we were persist to understand the process details about the summation over tensors element.

Firstly we know how

np.einsum('jil', b)

changes the elements' orders of b tensor.

but we cannot understand how np.einsum('ijk,jil->kl', a, b) combine (sums) tensors elements.

For tracking the process we used strings:

aa=[[['e'],
  ['r']],

 [['t'],
  ['y']],

 [['u'],
  ['o']],

 [['p'],
  ['q']]]

and

bb=[[[ 'x', 'c'],
  [ 'v' , 'n'],
  [ 'm',  'h'],
  [ 'f' , 'd']],

 [[ 's',  'w'],
  [ 'a','z'],
  ['j', 'k'],
  ['l', 'b']]]

Because we wanted to see how different elements combine for obtaining np.einsum('ijk,jil->kl', aa, bb)!

However np.einsum('jil', bb) works correctly but it did not show me the details of summation over elements.

1

There are 1 best solutions below

3
On

There are a few ways to understand this.

One is to use the example @Onyambu suggests.

>>> np.einsum('ijk,jil->ijkl', a, b)

array([[[[  0.,   0.]],
        [[  8.,   9.]]],

       [[[  4.,   6.]],
        [[ 30.,  33.]]],

       [[[ 16.,  20.]],
        [[ 60.,  65.]]],

       [[[ 36.,  42.]],
        [[ 98., 105.]]]])

By including i and j as indexes on the output, the output array no longer has shape (k, l), but (i, j, k, l). Also, none of the multiplied elements are being summed together. Each element of the output array is the product of one element from each of the original arrays.

To get back to the original behavior, we can sum by axis 1:

>>> np.einsum('ijk,jil->ijkl', a, b).sum(axis=1)
array([[[  8.,   9.]],
       [[ 34.,  39.]],
       [[ 76.,  85.]],
       [[134., 147.]]])

Then sum by axis 0:

>>> np.einsum('ijk,jil->ijkl', a, b).sum(axis=1).sum(axis=0)

array([[252., 280.]])

Another way to understand this is to convert it to an explicit loop.

The following code is equivalent to this einsum, but slower. (It also does not check that the shapes of A and B are compatible.)

def sum_array(A, B):
    i_len, j_len, k_len = A.shape
    _, _, l_len = B.shape
    
    ret = np.zeros((k_len, l_len))
    for i in range(i_len):
        for j in range(j_len):
            for k in range(k_len):
                for l in range(l_len):
                    ret[k, l] += A[i, j, k] * B[j, i, l]
    return ret

This gives us the same result, array([[252., 280.]]).

Notice how the inner line of the loop, ret[k, l] += A[i, j, k] * B[j, i, l] is similar to the einsum subscript 'ijk,jil->kl', except that the kl has been moved to the beginning, and ijk is being used to index A, and jil is being used to index B.

More information

Understanding NumPy's einsum