# Numerical multiplication of vectors of different sizes for loops

I have a matrix of, say, `P`

size `(X,Y)`

. Also, I have two matrices of, say, `Kx`

and `Ky`

size `(M,N)`

and a matrix of `pk`

size `(M,N)`

and two vectors `u`

and `v`

from `X`

and `Y`

respectively. For example, they can be defined as follows:

`import numpy as np P = np.zeros((X,Y)); pk = np.random.rand(M,N); Kx = np.random.rand(M,N); Ky = np.random.rand(M,N); u = np.random.rand(X); v = np.random.rand(Y);`

In real code, they are certainly not random, but that doesn't matter for this example. The question is whether there is a pure number equivalent to the following:

```
for m in range(0, M):
for n in range(0, N):
for i in range(0,X):
for j in range(0,Y):
Arg = Kx[m,n]*u[i] + Ky[m,n]*v[j];
P[i,j] += pk[m,n]*np.cos(Arg);
```

All `M`

, `N`

, `X`

, `Y`

different, but `X`

and `Y`

may be the same, if there is no other solution.

source to share

A common elimination strategy `for-loop`

in NumPy computation is to work with multidimensional arrays.

Consider, for example, the line

```
Arg = Kx[m,n]*u[i] + Ky[m,n]*v[j]
```

This line depends on the indices `m`

, `n`

, `i`

and `j`

. Therefore, `Arg`

it depends on the `m`

, `n`

, `i`

, and `j`

. This means that `Arg`

you can be considered as a 4-dimensional array, indexed `m`

, `n`

, `i`

and `j`

. So we can eliminate 4 `for-loops`

- as far as it goes `Arg`

- by calculating

```
Kxu = Kx[:,:,np.newaxis]*u
Kyv = Ky[:,:,np.newaxis]*v
Arg = Kxu[:,:,:,np.newaxis] + Kyv[:,:,np.newaxis,:]
```

`Kx[:,:,np.newaxis]`

has a shape `(M, N, 1)`

, but `u`

has a shape `(X,)`

. By multiplying them together, NumPy broadcasting creates an array of the shape `(M, N, X)`

. Thus, the higher the new axes are used more as placeholders so that `Arg`

ends four axes, indexed `m`

, `n`

, `i`

, `j`

in that order.

Similarly, `P`

one can define as

```
P = (pk[:,:,np.newaxis,np.newaxis]*np.cos(Arg)).sum(axis=0).sum(axis=0)
```

`sum(axis=0)`

(called twice) is summed along the axes `m`

and `n`

, so it `P`

ends up with a two-dimensional array indexed only by `i`

and `j`

.

By working with these 4-dimensional arrays, we are able to apply NumPy operations to entire NumPy arrays. On the contrary, when using 4, `for-loops`

we had to perform calculations on the value over scalars. Consider, for example, what it does `np.cos(Arg)`

when it `Arg`

is a 4-dimensional array. This disables the computation of all cosines in a single call to the NumPy function, which executes a basic loop in the compiled C code. This is much faster than calling `np.cos`

once for each scalar. It is for this reason that working with more massive arrays ends up so much faster than the code `for-loop`

.

```
import numpy as np
def orig(Kx, Ky, u, v, pk):
M, N = Kx.shape
X = u.size
Y = v.size
P = np.empty((X, Y), dtype=pk.dtype)
for m in range(0, M):
for n in range(0, N):
for i in range(0,X):
for j in range(0,Y):
Arg = Kx[m,n]*u[i] + Ky[m,n]*v[j]
P[i,j] += pk[m,n]*np.cos(Arg)
return P
def alt(Kx, Ky, u, v, pk):
Kxu = Kx[:,:,np.newaxis]*u
Kyv = Ky[:,:,np.newaxis]*v
Arg = Kxu[:,:,:,np.newaxis] + Kyv[:,:,np.newaxis,:]
P = (pk[:,:,np.newaxis,np.newaxis]*np.cos(Arg)).sum(axis=0).sum(axis=0)
return P
M, N = 10, 20
X, Y = 5, 15
Kx = np.random.random((M, N))
Ky = np.random.random((M, N))
u = np.random.random(X)
v = np.random.random(Y)
pk = np.random.random((M, N))
```

Health check (showing that alt and orig return the same result):

```
In [57]: P2 = alt(Kx, Ky, u, v, pk)
In [58]: P1 = orig(Kx, Ky, u, v, pk)
In [59]: np.allclose(P1, P2)
Out[59]: True
```

Test showing `alt`

is significantly faster than `orig`

:

```
In [60]: %timeit orig(Kx, Ky, u, v, pk)
10 loops, best of 3: 33.6 ms per loop
In [61]: %timeit alt(Kx, Ky, u, v, pk)
1000 loops, best of 3: 349 µs per loop
```

source to share