Vector nesting for loops with loop dependency in NumPy

Question

Vector nesting for loops with loop dependency in NumPy

I have the following function that generates a series of grid points on a tetrahedron.

def tet_grid(n):

    xv = np.array([
        [-1.,-1.,-1.],
        [ 1.,-1.,-1.],
        [-1., 1.,-1.],
        [-1.,-1., 1.],
        ])

    nsize = int((n+1)*(n+2)*(n+3)/6);
    xg = np.zeros((nsize,3))
    p = 0

    for i in range ( 0, n + 1 ):
        for j in range ( 0, n + 1 - i ):
            for k in range ( 0, n + 1 - i - j ):
                l = n - i - j - k
                xg[p,0]=(i * xv[0,0] + j * xv[1,0] + k * xv[2,0] + l * xv[3,0])/n 
                xg[p,1]=(i * xv[0,1] + j * xv[1,1] + k * xv[2,1] + l * xv[3,1])/n 
                xg[p,2]=(i * xv[0,2] + j * xv[1,2] + k * xv[2,2] + l * xv[3,2])/n 
                p = p + 1

    return xg

Is there an easy way to vectorize this in NumPy?

+3

python vectorization numpy

Tasam Farkie May 21 '17 at 10:49

source to share

2 answers

John Zwinck · Answer 1 · 2017-05-21T11:09:17+0000

The first simple thing you can do is use a broadcast to turn three calculations into one:

xg[p]=(i * xv[0] + j * xv[1] + k * xv[2] + l * xv[3])/n

Further, it should be noted that division by n

can be carried over to the very end:

return xg / n

We can then split the four multiplications and store the results separately and then concatenate them at the end. We now have:

xg = np.empty((nsize,4)) # n.b. zeros not required
p = 0

for i in range ( 0, n + 1 ):
    for j in range ( 0, n + 1 - i ):
        for k in range ( 0, n + 1 - i - j ):
            l = n - i - j - k
            xg[p,0] = i
            xg[p,1] = j
            xg[p,2] = k
            xg[p,3] = l
            p = p + 1

return (xg[:,:,None] * xv).sum(1) / n

The three xg[:,:,None]

at the bottom - translation (nsize,n) * (n,3)

- we expand (nsize,n)

xg

to (nsize,n,3)

, because we i,j,k,l

do not depend on which column from xv

is multiplied by.

We can skip the calculation l

in the loop and instead do everything right before return

:

xg[:,3] = n - xg[:,0:3].sum(1)

Now all you need to do is figure out how to create i,j,k

in vector according to p

.

As a general comment, I find it easiest to deal with these problems from the inside out, looking at the code in the innermost loop and pushing as much outside of as many loops as possible. Do this over and over and in the end there will be no cycles.

Andras Deak · Answer 2 · 2017-05-22T12:49:05+0000

You can get rid of the dependent nested loop, but at a waste cost in memory (more than usual for vectorization problems). You are accessing a small subset of the 3d window in your for loop. If you don't mind creating (n+1)^3

items for use only (n+1)(n+2)(n+3)/6

, and you can fit in memory, then the vectorized version will be much faster indeed.

My suggestion, explanation below:

import numpy as np

def tet_vect(n):

    xv = np.array([
        [-1.,-1.,-1.],
        [ 1.,-1.,-1.],
        [-1., 1.,-1.],
        [-1.,-1., 1.],
        ])

    # spanning arrays of a 3d grid according to range(0,n+1)
    ii,jj,kk = np.ogrid[:n+1,:n+1,:n+1]
    # indices of the triples which fall inside the original for loop
    inds = (jj < n+1-ii) & (kk < n+1-ii-jj)
    # the [i,j,k] indices of the points that fall inside the for loop, in the same order
    combs = np.vstack(np.where(inds)).T
    # combs is now an (nsize,3)-shaped array

    # compute "l" column too
    lcol = n - combs.sum(axis=1)
    combs = np.hstack((combs,lcol[:,None]))
    # combs is now an (nsize,4)-shaped array

    # all we need to do now is to take the matrix product of combs and xv, divide by n in the end
    xg = np.matmul(combs,xv)/n

    return xg

The key step is to create indexes ii,jj,kk

that span the 3D mesh. This step is memory efficient as it np.ogrid

creates spanning arrays that can be used in broadcast operations to reference your complete mesh. In the next step, we generate a complete array (n+1)^3

. The boolean array inds

selects those points in the 3d field that lie within the region of interest (and this is done using an array broadcast). The next step np.where(inds)

selects the truthful elements of this large array and we get fewer elements nsize

. Thus, the single step for retrieving memory is creation inds

.

The rest is simple: we need to create an extra column for the array that contains the indices [i,j,k]

in each row, this can be done by summing over the columns of the array (again a vector operation). When we have a (nsize,4)

-shaped auxiliary array that each contains (i,j,k,l)

in its own rows: we need to perform a matrix multiplication of this object with xv

, and we are done.

Small size testing n

assumes that the above function produces the same results as yours. Time frame for n = 100

: 1.15 from original, 19 ms vectorized.

Vector nesting for loops with loop dependency in NumPy

More articles: