Find n smallest elements in an array of numpy arrays
There are many questions here where to find the nth smallest element in a numpy array. However, what if you have an array of arrays? For example:
>>> print matrix
[[ 1. 0.28958002 0.09972488 ..., 0.46999924 0.64723113
0.60217694]
[ 0.28958002 1. 0.58005657 ..., 0.37668355 0.48852272
0.3860152 ]
[ 0.09972488 0.58005657 1. ..., 0.13151364 0.29539992
0.03686381]
...,
[ 0.46999924 0.37668355 0.13151364 ..., 1. 0.50250212
0.73128971]
[ 0.64723113 0.48852272 0.29539992 ..., 0.50250212 1. 0.71249226]
[ 0.60217694 0.3860152 0.03686381 ..., 0.73128971 0.71249226 1. ]]
How can I get the n smallest elements from this array of arrays?
>>> print type(matrix)
<type 'numpy.ndarray'>
This is how I did it to find the coordinates of the smallest element:
min_cordinates = []
for i in matrix:
if numpy.any(numpy.where(i==numpy.amin(matrix))[0]):
min_cordinates.append(int(numpy.where(i==numpy.amin(matrix))[0][0])+1)
Now I would like to find, for example, the 10 smallest items.
Flatten the matrix, sort and then select the first 10.
print(numpy.sort(matrix.flatten())[:10])
If your array is small, the accepted answer is fine. It np.partition
will be much more efficient for large arrays . Here's an example where an array contains 10,000 elements and you want the smallest 10 values:
In [56]: np.random.seed(123)
In [57]: a = 10*np.random.rand(100, 100)
Use np.partition
to get 10 smallest values:
In [58]: np.partition(a, 10, axis=None)[:10]
Out[58]:
array([ 0.00067838, 0.00081888, 0.00124711, 0.00120101, 0.00135942,
0.00271129, 0.00297489, 0.00489126, 0.00556923, 0.00594738])
Note that the values are not in ascending order. np.partition
does not guarantee that the first 10 values will be sorted. If you need them in ascending order, you can sort the selected values later. It will still be faster than sorting the entire array.
Here's the result using np.sort
:
In [59]: np.sort(a, axis=None)[:10]
Out[59]:
array([ 0.00067838, 0.00081888, 0.00120101, 0.00124711, 0.00135942,
0.00271129, 0.00297489, 0.00489126, 0.00556923, 0.00594738])
Now compare the times:
In [60]: %timeit np.partition(a, 10, axis=None)[:10]
10000 loops, best of 3: 75.1 µs per loop
In [61]: %timeit np.sort(a, axis=None)[:10]
1000 loops, best of 3: 465 µs per loop
In this case, use is np.partition
more than six times faster.
You can use a function to return a list of the 10 smallest items. heapq.nsmallest
In [84]: import heapq
In [85]: heapq.nsmallest(10, matrix.flatten())
Out[85]:
[-1.7009047695355393,
-1.4737632239971061,
-1.1246243781838825,
-0.7862983016935523,
-0.5080863016259798,
-0.43802651199959347,
-0.22125698200832566,
0.034938408281615596,
0.13610084041121048,
0.15876389111565958]