Best practice for fancy indexing a numpy array along multiple axes
I am trying to optimize an algorithm to reduce memory usage and I have defined this particular operation as a pain point.
I have a symmetric matrix, an array of indexes along the rows, and another index array along the columns (these are just all the values ββthat I did not select in the index of the row). I feel like I should just be able to pass both indices at the same time, but I have to select on one axis and then the other, which causes some memory problems because I don't really need a copy of the array that came back, only the statistics that I am calculating out of him. Here's what I'm trying to do:
from scipy.spatial.distance import pdist, squareform
from sklearn import datasets
import numpy as np
iris = datasets.load_iris().data
dx = pdist(iris)
mat = squareform(dx)
outliers = [41,62,106,108,109,134,135]
inliers = np.setdiff1d( range(iris.shape[0]), outliers)
# What I want to be able to do:
scores = mat[inliers, outliers].min(axis=0)
Here's what I actually do to make this work:
# What I'm being forced to do:
s1 = mat[:,outliers]
scores = s1[inliers,:].min(axis=0)
Because I'm fancy indexing, s1 is a new array instead of a view. I only need this array for one operation, so if I could eliminate the return of a copy here, or at least make the new array smaller (i.e. keeping the second fancy index choice when I do the first, instead of two separate fancy index operations index), which would be preferable.
source to share
βBroadcastingβ refers to indexing. You can convert inliers
to a matrix of columns (for example, inliers.reshape(-1,1)
or inliers[:, np.newaxis]
, so it has the shape (m, 1)) and an index mat
with what is in the first column:
s1 = mat[inliers.reshape(-1,1), outliers]
scores = s1.min(axis=0)
source to share
There's a better way in terms of readability:
result = mat[np.ix_(inliers, outliers)].min(0)
https://docs.scipy.org/doc/numpy/reference/generated/numpy.ix_.html#numpy.ix_
source to share