Getting min value from Python heapq

From the Python Docs :

The last two functions [heapq.nlargest and heapq.nsmallest] are best suited for smaller n values. For higher values, it is more efficient to use the sorted () function. Also, when n == 1, it is more efficient to use the built-in min () and max ().

If I want to get the smallest item in the mini-heap, why does the Python docs suggest using min()

that which I assume runs in O (n) time, when I can instead get the first item in the heap in O (1) times? (I am assuming the first item on the heap is the minimum)

+3


source to share


1 answer


The nsmallest

and methods nlargest

available from heapq

do not assume that the argument passed to them is already in heap format. Instead, they tend to "learn" the argument as they pass it, which will be more efficient than direct sorting for top-k elements for small k values, but for k exactly equal to one, even faster to avoid paying heapify-as-you -traverse, and just use min

directly.

Your statement is correct. If you are given an array that you can guarantee has been killed and hasn't changed since then accessing the first element will give you min (correspondingly max for the maximum heap).

Looking at the source code for heapq (maybe I'm looking at the old code?) Still seems strange to me. nsmallest

has a special case for n ==1

, implemented like this (line 397):

def nsmallest(n, iterable, key=None):
    """Find the n smallest elements in a dataset.

    Equivalent to:  sorted(iterable, key=key)[:n]
    """
    # Short-cut for n==1 is to use min() when len(iterable)>0
    if n == 1:
        it = iter(iterable)
        head = list(islice(it, 1))
        if not head:
            return []
        if key is None:
            return [min(chain(head, it))]
        return [min(chain(head, it), key=key)] 

    # ... rest of function

      

Just playing with this expression in the interpreter makes it weird:



In [203]: foo = list(itertools.islice([1,2,3], 1)); it = iter([1,2,3]); x = itertools.chain(foo, it);

In [204]: x.next()
Out[204]: 1

In [205]: x.next()
Out[205]: 1

In [206]: x.next()
Out[206]: 2

In [207]: x.next()
Out[207]: 3

In [208]: x.next()
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-208-e05f366da090> in <module>()
----> 1 x.next()

StopIteration:

      

It seems that a generator is being created (which immediately turns into list

) that only takes the 1st element (as you would expect with a bunch of minutes), but then this is weird chain

with a simple old generator that will traverse the entire array.

I agree that if you are starting with list

and want to request a minimum element, it is best to leave it as it is list

and use it min

. However, if you are handed a bunch of minutes, yes, indeed, you just need to inspect the first item - this is part of the purpose of its original destruction.

But despite this, this source code looks pretty weird for passing a mini-heap to min

- I'd be very happy to get more explanation on what it does, and perhaps a pointer to more recent C-level code to implement from heapq if he is.

+2


source







All Articles