How does `numpy.einsum` work?

The correct way to write summation in terms of Einstein summation is a mystery to me, so I want to try it in my code. I have succeeded on a few occasions, but mostly with trial and error.

Now, there is a case that I cannot understand. First, the main question. For two matrices A

and B

, which Nx1

and 1xN

, respectively, AB

is NxN

, but BA

is 1x1

. When I want to calculate case NxN

with np.einsum

, I can do:

import numpy as np

a = np.asarray([[1,2]])
b = np.asarray([[2,3]])
print np.einsum('ij,ji->ij', a, b)


and the final array is 2x2

. However

a = np.asarray([[1,2]])
b = np.asarray([[2,3]])
print np.einsum('ij,ij->ij', a, b)


returns an array 1x2

. I don't really understand why this doesn't give the correct result. For example, in the above case, the manual numpy

says that arrows can be used to force a summation or stop it. But that seems pretty vague to me; in the above case, I don't understand how numpy determines the final size of the output array based on the order of the indices (which seems to change).

Formally, I know the following: when there is nothing on the right side of the arrow, you can mathematically write the summation as $ \ sum \ limits_ {i = 0} ^ {N} \ sum \ limits_ {j = 0} ^ {M} A_ {ij} B_ {ij} $ for np.einsum('ij,ij',A,B)

, but when there is an arrow, I don't know how to interpret it in terms of a formal mathematical expression.


source to share

1 answer

In [22]: a
Out[22]: array([[1, 2]])
In [23]: b
Out[23]: array([[2, 3]])
In [24]: np.einsum('ij,ij->ij',a,b)
Out[24]: array([[2, 6]])
In [29]: a*b
Out[29]: array([[2, 6]])


Here, repeating indices in all parts, including the output, is interpreted as elemental multiplication. Nothing is cumulative. a[i,j]*b[i,j] = c[i,j]

for everyone i,j


In [25]: np.einsum('ij,ji->ij',a,b)
array([[2, 4],
       [3, 6]])
In [28]:,b).T
array([[2, 4],
       [3, 6]])
In [38]: np.outer(a,b)
array([[2, 3],
       [4, 6]])


Again not cumulative because the same indices appear on the left and right. a[i,j]*b[j,i] = c[i,j]

, in other words:

[[1*2, 2*2],
 [1*3, 2*3]]


Essentially an external product. A look at how it is a

broadcast against b.T

can help:

In [69]: np.broadcast_arrays(a,b.T)
[array([[1, 2],
        [1, 2]]), 
 array([[2, 2],
        [3, 3]])]


On the left side of the statement, the repeated indices indicate which dimensions are being multiplied. The correspondence of the left and right sides determines whether they are summed or not.

np.einsum('ij,ji->j',a,b) # array([ 5, 10]) sum on i only
np.einsum('ij,ji->i',a,b) # array([ 5, 10]) sum on j only
np.einsum('ij,ji',a,b) # 15 sum on i and j


A back I developed a pure Python equivalent for einsum

which focused on how it parsed a string. The goal is a creation nditer

by which the calculation of the sum of products is performed. But this is not a trivial script, even in Python:

A simple sequence showing these summation rules:

In [53]: c=np.array([[1,2],[3,4]])
In [55]: np.einsum('ij',c)
array([[1, 2],
       [3, 4]])
In [56]: np.einsum('ij->i',c)
Out[56]: array([3, 7])
In [57]: np.einsum('ij->j',c)
Out[57]: array([4, 6])
In [58]: np.einsum('ij->',c)
Out[58]: 10


Using arrays with no size 1

removes the complexity of broadcasting:

In [71]: b2=np.arange(1,7).reshape(2,3)
In [72]: np.einsum('ij,ji',a2,b2)
ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (2,3)->(2,3) (2,3)->(3,2) 


Or should I say that it provides a streaming attempt.

Ellipsis adds complexity to the interpretation of the einsum. I worked out the above github code when I solved the error in use ...

. But I didn't put much effort into clarifying the documentation.

Ellipsis streaming to numpy.einsum

Ellipses are most useful when you need an expression that can handle different array sizes. If your arrays are always 2D, this doesn't do anything extra.

As an example, consider a generalization dot

that multiplies the last dimension a

from the second to the last of B

. With ellipsis, we can write an expression that can handle mixed 2d, 3D and larger arrays:

np.einsum('...ij,...jk',np.ones((2,3)),np.ones((3,4)))  # (2,4)
np.einsum('...ij,...jk',np.ones((5,2,3)),np.ones((3,4)))  # (5,2,4)
np.einsum('...ij,...jk',np.ones((5,2,3)),np.ones((5,3,4))) # (5,2,4)
np.einsum('...ij,...jk',np.ones((5,2,3)),np.ones((7,5,3,4)))  # (7,5,2,4)
np.einsum('...ij,...jk->...ik',np.ones((5,2,3)),np.ones((7,5,3,4)) # (7, 5, 2, 4)


The last expression uses the default right-hand indexing ...ik

, ellipsis plus no-sum indexes.

Your original example can be written as



It actually fills i

(or more dimensions) according to the sizes of the arrays.

which would also work if a

or B

was 1d:


the way to generalize to large sizes is different

dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])


expressed in einsum as:



which can be summarized with






But I don't think I can fully reproduce it in einsum

. That is, I cannot say that it populates the indices for a

, followed by different ones for B




All Articles