heapq and PriorityQueue
In the previous lesson, I showed you two ways of using the built-in
list type as a priority queue. In this lesson, I’ll introduce you to
heapq and the
00:13 Heap queues are based on heaps. A heap is a special form of a tree data structure. There are several different kinds of heaps. The max-heap specifies that for any node in the tree, all children of that node have a value greater than or equal to their parent node. The min-heap is similar, but the value is less than or equal to the parent nodes. Binary heaps are heaps where no node has more than two children. Here’s an example.
00:42 The relationships between the nodes means that it is O(log n) to find the right spot to insert and also O(log n) to rearrange the tree when the smallest element is removed.
It turns out that a binary heap tree can be flattened and stored in a linear fashion, such as an array. You can think of this as a special way of sorting the array, but the sort order is the position in the heap. Inserting into a binary heap as an array is done by putting the element at the end of the array, then swapping parent and child elements until the newly inserted item is in the correct position in the array. In Python, the built-in
list type is implemented as an array. Inserting at the end of the
list is therefore cheap. Swapping items in a
list is also cheap. You are directly accessing two elements in memory.
01:35 Deleting an item from the binary heap is a bit more complicated, but similar to the insertion process. The nature of the relationship between nodes in the tree make insert and delete O(log n) operations.
heapq library provides methods for keeping a list in binary heap format. I’ll start off with a
list as my queue, import the
heapq library, then use the
heappush() method from the library to insert a new item in the heap. Nothing new here. Doing it again,
02:16 and you can see the result. One more time.
02:24 Notice that the order isn’t just a standard sort, but specific to the representation of a binary heap.
Popping off an item using
heappop() gets the lowest numbered item out of the heap. A relatively performant priority queue for your priority queue needs.
heapq to maintain your list is all well and good as long as you remember to do so. If you forget to use the
heapq methods at some point, you will damage the tree, further messing up its use in the future.
Complications arise from sort stability as well. First off, what is your desired behavior when two items have the same value? If you’re using tuples, you will get the tuple sort order, similar to the
insort() method in the previous lesson. If, on the other hand, you want the priority of items with the same priority value to be based on their insertion order—a common practice in scheduling—then you’re going to have to augment your tuples with more information.
Removing items from the heap can also be problematic, possibly damaging the sort order of the tree. The easiest way to do this is just to remove the item, then call
heapify() to return heap sorting order on the list.
In the case of moving an item—say, if its priority changes—do the delete first, then re-insert afterwards. Previous lessons talked about thread safety in concurrent programming. Python provides a thread-safe version of a
heapq in the
queue library called
PriorityQueue. It has the same performance and restrictions of
heapq, but also uses locks to ensure its methods are atomic. Similar to the other objects in the
.get_nowait(). To get started, do an import,
then create the object. Use
.put() to push items into the queue. Like the other members of the
queue library, not a particularly useful repr.
04:32 Let me put a few more things inside.
.get(), and another
.get(). And don’t forget those were blocking calls. For the non-blocking version, use
.get_nowait() with its expected
Pretty much the same as the other members of the
queue library. Isn’t consistency grand? Next up, I’ll talk about the pros and cons of the different priority queue mechanisms.
Become a Member to join the conversation.