Broadcast numeric indices from form

Question

Broadcast numeric indices from form

I have two array shapes that are passed through eachother.

eg. (2, 2, 1) and (2, 3)

I need a function that takes these shapes and gives me an iterator that returns the indices from these arrays with these shapes to be streamed together and the indices in the resulting output array.

iter, output_shape = broadcast_indeces_iterator((2, 2, 1), (2, 3))
assert output_shape == (2, 2, 3)
for in1_ix, in_2_ix, out_ix in iter:
    print (in1_ix, in_2_ix, out_ix)

outputs the result:

(0, 0, 0), (0, 0), (0, 0, 0)
(0, 0, 0), (0, 1), (0, 0, 1)
(0, 0, 0), (0, 2), (0, 0, 2)
(0, 1, 0), (1, 0), (0, 1, 0)
(0, 1, 0), (1, 1), (0, 1, 1)
(0, 1, 0), (1, 2), (0, 1, 2)
(1, 0, 0), (0, 0), (1, 0, 0)
(1, 0, 0), (0, 1), (1, 0, 1)
(1, 0, 0), (0, 2), (1, 0, 2)
(1, 1, 0), (1, 0), (1, 1, 0)
(1, 1, 0), (1, 1), (1, 1, 1)
(1, 1, 0), (1, 2), (1, 1, 2)

np.broadcast does something close but wants to create the generated arrays.

Note for NumPy users: it would be nice if np.broadcast had an additional argument to avoid iterating over, for example, the last 2 dimensions. This will also solve my problem.

+3

python arrays numpy indexing

Peter 28 nov. 14 at 22:27

source to share

3 answers

unutbu · Answer 1 · 2014-11-29T02:12:35+0000

import numpy as np
x = 10*np.arange(4).reshape((2, 2, 1))
y = 100*np.arange(6).reshape((2, 3))

z = np.nditer([x, y], flags=['multi_index', 'c_index'], order='C')
for a,b in z:
    print(np.unravel_index(z.index % x.size, x.shape)
          , np.unravel_index(z.index % y.size, y.shape)
          , z.multi_index)

gives

((0, 0, 0), (0, 0), (0, 0, 0))
((0, 1, 0), (0, 1), (0, 0, 1))
((1, 0, 0), (0, 2), (0, 0, 2))
((1, 1, 0), (1, 0), (0, 1, 0))
((0, 0, 0), (1, 1), (0, 1, 1))
((0, 1, 0), (1, 2), (0, 1, 2))
((1, 0, 0), (0, 0), (1, 0, 0))
((1, 1, 0), (0, 1), (1, 0, 1))
((0, 0, 0), (0, 2), (1, 0, 2))
((0, 1, 0), (1, 0), (1, 1, 0))
((1, 0, 0), (1, 1), (1, 1, 1))
((1, 1, 0), (1, 2), (1, 1, 2))

Peter · Answer 2 · 2014-11-28T23:13:53+0000

What a great question Peter. Here's your answer:

import numpy as np


def get_broadcast_shape(*shapes):
    '''
    Given a set of array shapes, return the shape of the output when arrays of those 
    shapes are broadcast together
    '''
    max_nim = max(len(s) for s in shapes)
    equal_len_shapes = np.array([(1, )*(max_nim-len(s))+s for s in shapes]) 
    max_dim_shapes = np.max(equal_len_shapes, axis = 0)
    assert np.all(np.bitwise_or(equal_len_shapes==1, equal_len_shapes == max_dim_shapes[None, :])), \
        'Shapes %s are not broadcastable together' % (shapes, )
    return tuple(max_dim_shapes)


def get_broadcast_indeces(*shapes):
    '''
    Given a set of shapes of arrays that you could broadcast together, return:
        output_shape: The shape of the resulting output array
        broadcast_shape_iterator: An iterator that returns a len(shapes)+1 tuple
            of the indeces of each input array and their corresponding index in the 
            output array
    '''
    output_shape = get_broadcast_shape(*shapes)
    base_iter = np.ndindex(output_shape)

    def broadcast_shape_iterator():
        for out_ix in base_iter:
            in_ixs = tuple(tuple(0 if s[i] == 1 else ix for i, ix in enumerate(out_ix[-len(s):])) for s in shapes)
            yield in_ixs + (out_ix, )

    return output_shape, broadcast_shape_iterator()


output_shape, ix_iter = get_broadcast_indeces((2, 2, 1), (2, 3))
assert output_shape == (2, 2, 3)
for in1_ix, in_2_ix, out_ix in ix_iter:
    print (in1_ix, in_2_ix, out_ix)

returns

((0, 0, 0), (0, 0), (0, 0, 0))
((0, 0, 0), (0, 1), (0, 0, 1))
((0, 0, 0), (0, 2), (0, 0, 2))
((0, 1, 0), (1, 0), (0, 1, 0))
((0, 1, 0), (1, 1), (0, 1, 1))
((0, 1, 0), (1, 2), (0, 1, 2))
((1, 0, 0), (0, 0), (1, 0, 0))
((1, 0, 0), (0, 1), (1, 0, 1))
((1, 0, 0), (0, 2), (1, 0, 2))
((1, 1, 0), (1, 0), (1, 1, 0))
((1, 1, 0), (1, 1), (1, 1, 1))
((1, 1, 0), (1, 2), (1, 1, 2))

If anyone knows of any built-in numpy functions that solve this problem, that would be better.

hpaulj · Answer 3 · 2014-11-30T04:43:48+0000

Here starts:

array1 = np.arange(4).reshape(2,2,1)*10
array2 = np.arange(6).reshape(2,3)

I, J = np.broadcast_arrays(array1, array2)
print I.shape
K = np.empty(I.shape, dtype=int)
for ijk in np.ndindex(I.shape):
    K[ijk] = I[ijk]+J[ijk]
print K

production

(2, 2, 3)  # broadcasted shape

[[[ 0  1  2]
  [13 14 15]]    
 [[20 21 22]
  [33 34 35]]]

I

is (2,2,3)

, but sharing its data with array1

- it's a broadcast view, not a copy (look at it .__array_interface__

).

You can only iterate over 2 dimensions, providing ndindex

only these shapes.

K = np.empty(I.shape, dtype=int)
for i,j in np.ndindex(I.shape[:2]):
    K[i,j,:] = I[i,j,:]+J[i,j,:]
    print K[i,j,:]

You can refine it by looking at the code for broadcast_arrays

and ndindex

to find the snippets you need. For example at fooobar.com/questions/533771 / ... I call nditer

directly to generate multi_index

(an operation that can be adapted to cython).

xx = np.zeros(y.shape[:2])
it = np.nditer(xx,flags=['multi_index'])                               
while not it.finished:
    print y[it.multi_index],
    it.iternext()
# [242  14 211] [198   7   0] [235  60  81] [164  64 236]

To make "dummy arrays" practically useless, I could find the key from ndindex

and make an array starting withnp.zeros(1)

def make_dummy(shape):
    x = as_strided(np.zeros(1),shape=shape, strides=np.zeros_like(shape))
    return x
array1 = make_dummy((2,2,1))
array2 = make_dummy((2,3))

I could dig into np.broadcast_arrays

to see how it concatenates shapes from 2 input arrays to come up with a shape for I

.

There is a difference between your desired solution and mine, which I have kept quiet.

(0, 0, 0), (0, 0), (0, 0, 0)
(0, 0, 0), (0, 1), (0, 0, 1)
...
(1, 1, 0), (1, 1), (1, 1, 1)
(1, 1, 0), (1, 2), (1, 1, 2)

expects a different iterator tuple for each array that is in a range (2,2,1)

other than ranges over (2,3)

, etc.

My approach, which I think the numpy

c code (at least those parts based on nditer

) uses , generates one iterator on top (2,2,3)

and massages the arrays through as_strided

to accept that large range. This makes it easier to implement a common broadcast mechanism. It decouples the broadcast complexity from the core of the computation.

Here's a good introduction to nditer

:

http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html

Broadcast numeric indices from form

More articles: