Python's collections: A Buffet of Specialized Data Types

This text is part of a Real Python tutorial by Leodanis Pozo Ramos.


Python’s collections module provides a rich set of specialized container data types carefully designed to approach specific programming problems in a Pythonic and efficient way. The module also provides wrapper classes that make it safer to create custom classes that behave similar to the built-in types dict, list, and str.

Learning about the data types and classes in collections will allow you to grow your programming tool kit with a valuable set of reliable and efficient tools.

In this lesson, you’ll learn how to:

  • Build efficient queues and stacks with deque
  • Count objects quickly with Counter
  • Handle missing dictionary keys with defaultdict
  • Guarantee the insertion order of keys with OrderedDict
  • Manage multiple dictionaries as a single unit with ChainMap

Getting Started With Python’s collections

Back in Python 2.4, Raymond Hettinger contributed a new module called collections to the standard library. The goal was to provide various specialized collection data types to approach specific programming problems.

At that time, collections only included one data structure, deque, which was specially designed as a double-ended queue that supports efficient append and pop operations on either end of the sequence. From this point on, several modules in the standard library took advantage of deque to improve the performance of their classes and structures. Some outstanding examples are queue and threading.

With time, a handful of specialized container data types populated the module:

Data type Python version Description
deque 2.4 A sequence-like collection that supports efficient addition and removal of items from either end of the sequence
defaultdict 2.5 A dictionary subclass for constructing default values for missing keys and automatically adding them to the dictionary
namedtuple() 2.6 A factory function for creating subclasses of tuple that provides named fields that allow accessing items by name while keeping the ability to access items by index
OrderedDict 2.7, 3.1 A dictionary subclass that keeps the key-value pairs ordered according to when the keys are inserted
Counter 2.7, 3.1 A dictionary subclass that supports convenient counting of unique items in a sequence or iterable
ChainMap 3.3 A dictionary-like class that allows treating a number of mappings as a single dictionary object

Besides these specialized data types, collections also provides three base classes that facilitate the creations of custom lists, dictionaries, and strings:

Class Description
UserDict A wrapper class around a dictionary object that facilitates subclassing dict
UserList A wrapper class around a list object that facilitates subclassing list
UserString A wrapper class around a string object that facilitates subclassing string

The need for these wrapper classes was partially eclipsed by the ability to subclass the corresponding standard built-in data types. However, sometimes using these classes is safer and less error-prone than using standard data types.

With this brief introduction to collections and the specific use cases that the data structures and classes in this module can solve, it’s time to take a closer look at them.

You’ll focus on namedtuple() later this week. Today, you’ll explore some of the other tools in the collections module.

Building Efficient Queues and Stacks: deque

Python’s deque was the first data structure in collections. This sequence-like data type is a generalization of stacks and queues designed to support memory-efficient and fast append and pop operations on both ends of the data structure.

In Python, append and pop operations on the beginning or left side of list objects are inefficient, with O(n) time complexity. These operations are especially expensive if you’re working with large lists because Python has to move all the items to the right to insert new items at the beginning of the list.

On the other hand, append and pop operations on the right side of a list are normally efficient (O(1)) except for those cases in which Python needs to reallocate memory to grow the underlying list for accepting new items.

Python’s deque was created to overcome this problem. Append and pop operations on both sides of a deque object are stable and equally efficient because deques are implemented as a doubly linked list. That’s why deques are particularly useful for creating stacks and queues.

Take a queue as an example. It manages items in a First-In/First-Out (FIFO) fashion. It works as a pipe, where you push in new items at one end of the pipe and pop old items out from the other end. Adding an item to the end of a queue is known as an enqueue operation. Removing an item from the front or beginning of a queue is called dequeue.

Now say you’re modeling a queue of people waiting to buy tickets to a movie. You can do that with a deque. Every time a new person arrives, you enqueue them. When the person at the front of the queue gets their tickets, you dequeue them.

Here’s how you can emulate the process using a deque object:

Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Already a member? Sign-In

Locked learning resources

The full lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Already a member? Sign-In

You must own this product to join the conversation.