Numerical multiplication of vectors of different sizes for loops
I have a matrix of, say, P
size (X,Y)
. Also, I have two matrices of, say, Kx
and Ky
size (M,N)
and a matrix of pk
size (M,N)
and two vectors u
and v
from X
and Y
respectively. For example, they can be defined as follows:
import numpy as np P = np.zeros((X,Y)); pk = np.random.rand(M,N); Kx = np.random.rand(M,N); Ky = np.random.rand(M,N); u = np.random.rand(X); v = np.random.rand(Y);
In real code, they are certainly not random, but that doesn't matter for this example. The question is whether there is a pure number equivalent to the following:
for m in range(0, M):
for n in range(0, N):
for i in range(0,X):
for j in range(0,Y):
Arg = Kx[m,n]*u[i] + Ky[m,n]*v[j];
P[i,j] += pk[m,n]*np.cos(Arg);
All M
, N
, X
, Y
different, but X
and Y
may be the same, if there is no other solution.
source to share
A common elimination strategy for-loop
in NumPy computation is to work with multidimensional arrays.
Consider, for example, the line
Arg = Kx[m,n]*u[i] + Ky[m,n]*v[j]
This line depends on the indices m
, n
, i
and j
. Therefore, Arg
it depends on the m
, n
, i
, and j
. This means that Arg
you can be considered as a 4-dimensional array, indexed m
, n
, i
and j
. So we can eliminate 4 for-loops
- as far as it goes Arg
- by calculating
Kxu = Kx[:,:,np.newaxis]*u
Kyv = Ky[:,:,np.newaxis]*v
Arg = Kxu[:,:,:,np.newaxis] + Kyv[:,:,np.newaxis,:]
Kx[:,:,np.newaxis]
has a shape (M, N, 1)
, but u
has a shape (X,)
. By multiplying them together, NumPy broadcasting creates an array of the shape (M, N, X)
. Thus, the higher the new axes are used more as placeholders so that Arg
ends four axes, indexed m
, n
, i
, j
in that order.
Similarly, P
one can define as
P = (pk[:,:,np.newaxis,np.newaxis]*np.cos(Arg)).sum(axis=0).sum(axis=0)
sum(axis=0)
(called twice) is summed along the axes m
and n
, so it P
ends up with a two-dimensional array indexed only by i
and j
.
By working with these 4-dimensional arrays, we are able to apply NumPy operations to entire NumPy arrays. On the contrary, when using 4, for-loops
we had to perform calculations on the value over scalars. Consider, for example, what it does np.cos(Arg)
when it Arg
is a 4-dimensional array. This disables the computation of all cosines in a single call to the NumPy function, which executes a basic loop in the compiled C code. This is much faster than calling np.cos
once for each scalar. It is for this reason that working with more massive arrays ends up so much faster than the code for-loop
.
import numpy as np
def orig(Kx, Ky, u, v, pk):
M, N = Kx.shape
X = u.size
Y = v.size
P = np.empty((X, Y), dtype=pk.dtype)
for m in range(0, M):
for n in range(0, N):
for i in range(0,X):
for j in range(0,Y):
Arg = Kx[m,n]*u[i] + Ky[m,n]*v[j]
P[i,j] += pk[m,n]*np.cos(Arg)
return P
def alt(Kx, Ky, u, v, pk):
Kxu = Kx[:,:,np.newaxis]*u
Kyv = Ky[:,:,np.newaxis]*v
Arg = Kxu[:,:,:,np.newaxis] + Kyv[:,:,np.newaxis,:]
P = (pk[:,:,np.newaxis,np.newaxis]*np.cos(Arg)).sum(axis=0).sum(axis=0)
return P
M, N = 10, 20
X, Y = 5, 15
Kx = np.random.random((M, N))
Ky = np.random.random((M, N))
u = np.random.random(X)
v = np.random.random(Y)
pk = np.random.random((M, N))
Health check (showing that alt and orig return the same result):
In [57]: P2 = alt(Kx, Ky, u, v, pk)
In [58]: P1 = orig(Kx, Ky, u, v, pk)
In [59]: np.allclose(P1, P2)
Out[59]: True
Test showing alt
is significantly faster than orig
:
In [60]: %timeit orig(Kx, Ky, u, v, pk)
10 loops, best of 3: 33.6 ms per loop
In [61]: %timeit alt(Kx, Ky, u, v, pk)
1000 loops, best of 3: 349 µs per loop
source to share