Broadcast numeric indices from form
I have two array shapes that are passed through eachother.
eg. (2, 2, 1) and (2, 3)
I need a function that takes these shapes and gives me an iterator that returns the indices from these arrays with these shapes to be streamed together and the indices in the resulting output array.
iter, output_shape = broadcast_indeces_iterator((2, 2, 1), (2, 3))
assert output_shape == (2, 2, 3)
for in1_ix, in_2_ix, out_ix in iter:
print (in1_ix, in_2_ix, out_ix)
outputs the result:
(0, 0, 0), (0, 0), (0, 0, 0) (0, 0, 0), (0, 1), (0, 0, 1) (0, 0, 0), (0, 2), (0, 0, 2) (0, 1, 0), (1, 0), (0, 1, 0) (0, 1, 0), (1, 1), (0, 1, 1) (0, 1, 0), (1, 2), (0, 1, 2) (1, 0, 0), (0, 0), (1, 0, 0) (1, 0, 0), (0, 1), (1, 0, 1) (1, 0, 0), (0, 2), (1, 0, 2) (1, 1, 0), (1, 0), (1, 1, 0) (1, 1, 0), (1, 1), (1, 1, 1) (1, 1, 0), (1, 2), (1, 1, 2)
np.broadcast does something close but wants to create the generated arrays.
- Note for NumPy users: it would be nice if np.broadcast had an additional argument to avoid iterating over, for example, the last 2 dimensions. This will also solve my problem.
source to share
import numpy as np
x = 10*np.arange(4).reshape((2, 2, 1))
y = 100*np.arange(6).reshape((2, 3))
z = np.nditer([x, y], flags=['multi_index', 'c_index'], order='C')
for a,b in z:
print(np.unravel_index(z.index % x.size, x.shape)
, np.unravel_index(z.index % y.size, y.shape)
, z.multi_index)
gives
((0, 0, 0), (0, 0), (0, 0, 0)) ((0, 1, 0), (0, 1), (0, 0, 1)) ((1, 0, 0), (0, 2), (0, 0, 2)) ((1, 1, 0), (1, 0), (0, 1, 0)) ((0, 0, 0), (1, 1), (0, 1, 1)) ((0, 1, 0), (1, 2), (0, 1, 2)) ((1, 0, 0), (0, 0), (1, 0, 0)) ((1, 1, 0), (0, 1), (1, 0, 1)) ((0, 0, 0), (0, 2), (1, 0, 2)) ((0, 1, 0), (1, 0), (1, 1, 0)) ((1, 0, 0), (1, 1), (1, 1, 1)) ((1, 1, 0), (1, 2), (1, 1, 2))
source to share
What a great question Peter. Here's your answer:
import numpy as np
def get_broadcast_shape(*shapes):
'''
Given a set of array shapes, return the shape of the output when arrays of those
shapes are broadcast together
'''
max_nim = max(len(s) for s in shapes)
equal_len_shapes = np.array([(1, )*(max_nim-len(s))+s for s in shapes])
max_dim_shapes = np.max(equal_len_shapes, axis = 0)
assert np.all(np.bitwise_or(equal_len_shapes==1, equal_len_shapes == max_dim_shapes[None, :])), \
'Shapes %s are not broadcastable together' % (shapes, )
return tuple(max_dim_shapes)
def get_broadcast_indeces(*shapes):
'''
Given a set of shapes of arrays that you could broadcast together, return:
output_shape: The shape of the resulting output array
broadcast_shape_iterator: An iterator that returns a len(shapes)+1 tuple
of the indeces of each input array and their corresponding index in the
output array
'''
output_shape = get_broadcast_shape(*shapes)
base_iter = np.ndindex(output_shape)
def broadcast_shape_iterator():
for out_ix in base_iter:
in_ixs = tuple(tuple(0 if s[i] == 1 else ix for i, ix in enumerate(out_ix[-len(s):])) for s in shapes)
yield in_ixs + (out_ix, )
return output_shape, broadcast_shape_iterator()
output_shape, ix_iter = get_broadcast_indeces((2, 2, 1), (2, 3))
assert output_shape == (2, 2, 3)
for in1_ix, in_2_ix, out_ix in ix_iter:
print (in1_ix, in_2_ix, out_ix)
returns
((0, 0, 0), (0, 0), (0, 0, 0)) ((0, 0, 0), (0, 1), (0, 0, 1)) ((0, 0, 0), (0, 2), (0, 0, 2)) ((0, 1, 0), (1, 0), (0, 1, 0)) ((0, 1, 0), (1, 1), (0, 1, 1)) ((0, 1, 0), (1, 2), (0, 1, 2)) ((1, 0, 0), (0, 0), (1, 0, 0)) ((1, 0, 0), (0, 1), (1, 0, 1)) ((1, 0, 0), (0, 2), (1, 0, 2)) ((1, 1, 0), (1, 0), (1, 1, 0)) ((1, 1, 0), (1, 1), (1, 1, 1)) ((1, 1, 0), (1, 2), (1, 1, 2))
If anyone knows of any built-in numpy functions that solve this problem, that would be better.
source to share
Here starts:
array1 = np.arange(4).reshape(2,2,1)*10 array2 = np.arange(6).reshape(2,3) I, J = np.broadcast_arrays(array1, array2) print I.shape K = np.empty(I.shape, dtype=int) for ijk in np.ndindex(I.shape): K[ijk] = I[ijk]+J[ijk] print K
production
(2, 2, 3) # broadcasted shape
[[[ 0 1 2]
[13 14 15]]
[[20 21 22]
[33 34 35]]]
I
is (2,2,3)
, but sharing its data with array1
- it's a broadcast view, not a copy (look at it .__array_interface__
).
You can only iterate over 2 dimensions, providing ndindex
only these shapes.
K = np.empty(I.shape, dtype=int)
for i,j in np.ndindex(I.shape[:2]):
K[i,j,:] = I[i,j,:]+J[i,j,:]
print K[i,j,:]
You can refine it by looking at the code for broadcast_arrays
and ndindex
to find the snippets you need. For example at fooobar.com/questions/533771 / ... I call nditer
directly to generate multi_index
(an operation that can be adapted to cython).
xx = np.zeros(y.shape[:2])
it = np.nditer(xx,flags=['multi_index'])
while not it.finished:
print y[it.multi_index],
it.iternext()
# [242 14 211] [198 7 0] [235 60 81] [164 64 236]
To make "dummy arrays" practically useless, I could find the key from ndindex
and make an array starting withnp.zeros(1)
def make_dummy(shape):
x = as_strided(np.zeros(1),shape=shape, strides=np.zeros_like(shape))
return x
array1 = make_dummy((2,2,1))
array2 = make_dummy((2,3))
I could dig into np.broadcast_arrays
to see how it concatenates shapes from 2 input arrays to come up with a shape for I
.
There is a difference between your desired solution and mine, which I have kept quiet.
(0, 0, 0), (0, 0), (0, 0, 0) (0, 0, 0), (0, 1), (0, 0, 1) ... (1, 1, 0), (1, 1), (1, 1, 1) (1, 1, 0), (1, 2), (1, 1, 2)
expects a different iterator tuple for each array that is in a range (2,2,1)
other than ranges over (2,3)
, etc.
My approach, which I think the numpy
c code (at least those parts based on nditer
) uses , generates one iterator on top (2,2,3)
and massages the arrays through as_strided
to accept that large range. This makes it easier to implement a common broadcast mechanism. It decouples the broadcast complexity from the core of the computation.
Here's a good introduction to nditer
:
http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html
source to share