Getting min value from Python heapq
From the Python Docs :
The last two functions [heapq.nlargest and heapq.nsmallest] are best suited for smaller n values. For higher values, it is more efficient to use the sorted () function. Also, when n == 1, it is more efficient to use the built-in min () and max ().
If I want to get the smallest item in the mini-heap, why does the Python docs suggest using min()
that which I assume runs in O (n) time, when I can instead get the first item in the heap in O (1) times? (I am assuming the first item on the heap is the minimum)
source to share
The nsmallest
and methods nlargest
available from heapq
do not assume that the argument passed to them is already in heap format. Instead, they tend to "learn" the argument as they pass it, which will be more efficient than direct sorting for top-k elements for small k values, but for k exactly equal to one, even faster to avoid paying heapify-as-you -traverse, and just use min
directly.
Your statement is correct. If you are given an array that you can guarantee has been killed and hasn't changed since then accessing the first element will give you min (correspondingly max for the maximum heap).
Looking at the source code for heapq (maybe I'm looking at the old code?) Still seems strange to me. nsmallest
has a special case for n ==1
, implemented like this (line 397):
def nsmallest(n, iterable, key=None):
"""Find the n smallest elements in a dataset.
Equivalent to: sorted(iterable, key=key)[:n]
"""
# Short-cut for n==1 is to use min() when len(iterable)>0
if n == 1:
it = iter(iterable)
head = list(islice(it, 1))
if not head:
return []
if key is None:
return [min(chain(head, it))]
return [min(chain(head, it), key=key)]
# ... rest of function
Just playing with this expression in the interpreter makes it weird:
In [203]: foo = list(itertools.islice([1,2,3], 1)); it = iter([1,2,3]); x = itertools.chain(foo, it);
In [204]: x.next()
Out[204]: 1
In [205]: x.next()
Out[205]: 1
In [206]: x.next()
Out[206]: 2
In [207]: x.next()
Out[207]: 3
In [208]: x.next()
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
<ipython-input-208-e05f366da090> in <module>()
----> 1 x.next()
StopIteration:
It seems that a generator is being created (which immediately turns into list
) that only takes the 1st element (as you would expect with a bunch of minutes), but then this is weird chain
with a simple old generator that will traverse the entire array.
I agree that if you are starting with list
and want to request a minimum element, it is best to leave it as it is list
and use it min
. However, if you are handed a bunch of minutes, yes, indeed, you just need to inspect the first item - this is part of the purpose of its original destruction.
But despite this, this source code looks pretty weird for passing a mini-heap to min
- I'd be very happy to get more explanation on what it does, and perhaps a pointer to more recent C-level code to implement from heapq if he is.
source to share