Best practice for fancy indexing a numpy array along multiple axes

I am trying to optimize an algorithm to reduce memory usage and I have defined this particular operation as a pain point.

I have a symmetric matrix, an array of indexes along the rows, and another index array along the columns (these are just all the values ​​that I did not select in the index of the row). I feel like I should just be able to pass both indices at the same time, but I have to select on one axis and then the other, which causes some memory problems because I don't really need a copy of the array that came back, only the statistics that I am calculating out of him. Here's what I'm trying to do:

from scipy.spatial.distance import pdist, squareform
from sklearn import datasets
import numpy as np

iris = datasets.load_iris().data

dx = pdist(iris)
mat = squareform(dx)

outliers = [41,62,106,108,109,134,135]
inliers = np.setdiff1d( range(iris.shape[0]), outliers)

# What I want to be able to do:
scores = mat[inliers, outliers].min(axis=0)

      

Here's what I actually do to make this work:

# What I'm being forced to do:
s1 = mat[:,outliers]
scores = s1[inliers,:].min(axis=0)

      

Because I'm fancy indexing, s1 is a new array instead of a view. I only need this array for one operation, so if I could eliminate the return of a copy here, or at least make the new array smaller (i.e. keeping the second fancy index choice when I do the first, instead of two separate fancy index operations index), which would be preferable.

+3


source to share


3 answers


β€œBroadcasting” refers to indexing. You can convert inliers

to a matrix of columns (for example, inliers.reshape(-1,1)

or inliers[:, np.newaxis]

, so it has the shape (m, 1)) and an index mat

with what is in the first column:



s1 = mat[inliers.reshape(-1,1), outliers]
scores = s1.min(axis=0)

      

+5


source


Try:



outliers = np.array(outliers)  # just to be sure they are arrays
result = mat[inliers[:, np.newaxis], outliers[np.newaxis, :]].min(0)

      

+1


source


There's a better way in terms of readability:

result = mat[np.ix_(inliers, outliers)].min(0)

      

https://docs.scipy.org/doc/numpy/reference/generated/numpy.ix_.html#numpy.ix_

0


source







All Articles