The efficiency of the Python built-in sorted () function vs. list insert ()

I'm not new to this, but I don't use Python much and my knowledge is quite broad and not very deep in the language, maybe someone here more knowledgeable can answer my question. I am in a situation where I need to add items to a list and store it as added items. A quick way to do this would be.

list.append(item)                  // O(1)
list.sort()                        // ??

      

I would suggest that if this is the only way to add items to the list, I hope the sorting is pretty efficient because the list is sorted with each addition. However, there is one that works:

inserted = False
for i in range(len(list)):         // O(N)
    if (item < list[i]): 
        list.insert(i, item)       // ??
        inserted = True
        break
if not inserted: list.append(item)

      

Can anyone tell me if one of these is obviously more efficient? I'm leaning towards the second set of statements however I really have no idea.

+3


source to share


2 answers


What you are looking for is the bisect module and most possibly insort_left

So your expression can be equivalently written as

from



some_list.append(item)                  // O(1)
some_list.sort()                        // ??

      

to

bisect.insort_left(some_list, item)

      

+6


source


Pasting anywhere but the closest end takes O (n) time, since it has to move (copy) all elements that appear after the insertion point. But on the other hand, all comparison-based sorting algorithms should do Omega (n log n) comparisons on average. Many views (including timsort, which Python uses) will greatly improve performance on many inputs, probably including yours (the "almost sorted" case). They still have to move at least as many elements as inserting into the correct position right away. They also have to do quite a lot of extra work (checking all the elements to make sure they are in the correct order, plus more complex logic that often improves performance, but not in your case). For these reasons, it is probably slowerat least for large lists.



Due to the fact that it is written in C (in CPython, but similar reasoning applies to other Pythons as well), it can still be faster than your Python-based line scanner. This leaves the question of how to find the insertion point. Binary search can serve this role in O (log n), so it is very useful here (of course, inserting is still O (n), but there is no way around this if you want a sorted list). Unfortunately, binary search is quite difficult to implement. Fortunately, it has already been implemented in the standard library: bisect

.

+2


source







All Articles