This text is part of a Real Python tutorial by Leodanis Pozo Ramos.
Python’s collections
module provides a rich set of specialized container data types carefully designed to approach specific programming problems in a Pythonic and efficient way. The module also provides wrapper classes that make it safer to create custom classes that behave similar to the built-in types dict
, list
, and str
.
Learning about the data types and classes in collections
will allow you to grow your programming tool kit with a valuable set of reliable and efficient tools.
In this lesson, you’ll learn how to:
- Build efficient queues and stacks with
deque
- Count objects quickly with
Counter
- Handle missing dictionary keys with
defaultdict
- Guarantee the insertion order of keys with
OrderedDict
- Manage multiple dictionaries as a single unit with
ChainMap
Getting Started With Python’s collections
Back in Python 2.4, Raymond Hettinger contributed a new module called collections
to the standard library. The goal was to provide various specialized collection data types to approach specific programming problems.
At that time, collections
only included one data structure, deque
, which was specially designed as a double-ended queue that supports efficient append and pop operations on either end of the sequence. From this point on, several modules in the standard library took advantage of deque
to improve the performance of their classes and structures. Some outstanding examples are queue
and threading
.
With time, a handful of specialized container data types populated the module:
Data type | Python version | Description |
---|---|---|
deque |
2.4 | A sequence-like collection that supports efficient addition and removal of items from either end of the sequence |
defaultdict |
2.5 | A dictionary subclass for constructing default values for missing keys and automatically adding them to the dictionary |
namedtuple() |
2.6 | A factory function for creating subclasses of tuple that provides named fields that allow accessing items by name while keeping the ability to access items by index |
OrderedDict |
2.7, 3.1 | A dictionary subclass that keeps the key-value pairs ordered according to when the keys are inserted |
Counter |
2.7, 3.1 | A dictionary subclass that supports convenient counting of unique items in a sequence or iterable |
ChainMap |
3.3 | A dictionary-like class that allows treating a number of mappings as a single dictionary object |
Besides these specialized data types, collections
also provides three base classes that facilitate the creations of custom lists, dictionaries, and strings:
Class | Description |
---|---|
UserDict |
A wrapper class around a dictionary object that facilitates subclassing dict |
UserList |
A wrapper class around a list object that facilitates subclassing list |
UserString |
A wrapper class around a string object that facilitates subclassing string |
The need for these wrapper classes was partially eclipsed by the ability to subclass the corresponding standard built-in data types. However, sometimes using these classes is safer and less error-prone than using standard data types.
With this brief introduction to collections
and the specific use cases that the data structures and classes in this module can solve, it’s time to take a closer look at them.
You’ll focus on namedtuple()
later this week. Today, you’ll explore some of the other tools in the collections
module.
Building Efficient Queues and Stacks: deque
Python’s deque
was the first data structure in collections
. This sequence-like data type is a generalization of stacks and queues designed to support memory-efficient and fast append and pop operations on both ends of the data structure.
Note: The word deque
is pronounced “deck” and stands for double-ended queue.
In Python, append and pop operations on the beginning or left side of list
objects are inefficient, with O(n) time complexity. These operations are especially expensive if you’re working with large lists because Python has to move all the items to the right to insert new items at the beginning of the list.
On the other hand, append and pop operations on the right side of a list are normally efficient (O(1)) except for those cases in which Python needs to reallocate memory to grow the underlying list for accepting new items.
Python’s deque
was created to overcome this problem. Append and pop operations on both sides of a deque
object are stable and equally efficient because deques are implemented as a doubly linked list. That’s why deques are particularly useful for creating stacks and queues.
Take a queue as an example. It manages items in a First-In/First-Out (FIFO) fashion. It works as a pipe, where you push in new items at one end of the pipe and pop old items out from the other end. Adding an item to the end of a queue is known as an enqueue operation. Removing an item from the front or beginning of a queue is called dequeue.
Now say you’re modeling a queue of people waiting to buy tickets to a movie. You can do that with a deque
. Every time a new person arrives, you enqueue them. When the person at the front of the queue gets their tickets, you dequeue them.
Here’s how you can emulate the process using a deque
object: