Diving Deeper Into defaultdict
00:00
Welcome back. In the last lesson, I showed you how you can use defaultdict
to solve concrete problems in different scenarios. In this lesson, we’ll look a bit deeper into defaultdict
and how it works under the hood.
00:11
I’m going to be telling you about four different things here. I’m going to contrast defaultdict
with dict
—remember that defaultdict
is a subclass of dict
.
00:20
After that, we’ll look at .default_factory
in a bit more detail. .default_factory
is really the heart and engine of defaultdict
.
00:27
Following that, we’ll contrast defaultdict
with .setdefault()
. Remember that .setdefault()
is a method which works on dict
right out of the box and allows you to address this missing key problem. And then finally, we will say a few words about .__missing__()
. Okay, so let’s start by comparing defaultdict
and dict
.
00:46
Here we are back in the terminal. The first thing I’m going to do is I’m going to import defaultdict
from collections
.
00:56
There we go. Now I have access to defaultdict
and all of its attributes and methods. To better contrast defaultdict
and dict
, I’m going to be using two methods, set()
and dir()
.
01:07
I’ll start by typing out set(dir())
, and in each case I’m going to be passing an object to them. First, I’ll use defaultdict
without the opening parentheses since I’m not actually passing a value to this object.
01:25
And then I’ll do the same for dict
—and don’t worry, I’m going to explain what I’m doing here in a minute. I have my code set up here. What dir()
does is it returns a list
of all the attributes and methods which an object has, and set()
in turn boils down that list
into a set
of unique items.
01:43
So what I’m going to get back here is the attributes which are in defaultdict
which are not in dict
.
01:50
And they are '__missing__'
, 'default_factory'
, and '__copy__'
. .__copy__()
does what you would expect—it supports copying. .__missing__()
gets called when .__getitem__()
fails to find a key.
02:02
And here’s the interesting bit: when .__missing__()
is called, it in turn triggers .default_factory()
, which as I mentioned is really the heart of defaultdict
.
02:10
We’re going to be talking a bit more about .default_factory
in the next item in this lesson, but just before we move on, I would really like to rub in how similar defaultdict
and dict
are.
02:21 The first thing I’m going to do here is I’m going to create a standard vanilla dictionary and it has a few random elements in it.
02:32
So we can have a look at this. There are two lists, and they’re a list of numbers
and the list of letters
.
02:38
Next, I’ll create a defaultdict
which is very similar to that dict
. I’m passing the same inputs to it: a list of letters
, a list of numbers
.
02:47
The key difference is here. I am passing list
as a callable, which of course gets passed on to .default_factory
.
02:55
So let’s go ahead and create this and just have a look inside—so def_dict
, and here you can see. This is a defaultdict
and its contents are exactly the same, in fact, as in the dict
I just created.
03:12
The key difference is that I have 'list'
here as the callable. Let’s look at how similar these two dictionaries actually are. To do that, I’m going to take my std_dict
(standard dict
) and I’m going to check if it’s the same as def_dict
. So here we go—and you can see this is True
.
03:32
So these two dictionaries—or these two objects, rather, one of which is a dict
and one of which is a defaultdict
—are exactly the same. So as I mentioned, .default_factory
is really the heart and soul of defaultdict
. In fact, it’s what really sets defaultdict
apart from dict
.
03:50
Let’s have a look at how .default_factory
works in a bit more detail. We’re back in the REPL, and keep in mind I’ve already imported defaultdict
from collections
, so I don’t need to do that again. So as mentioned earlier, .default_factory
is set to a callable and the callable is the first argument that you pass when you’re creating a defaultdict
.
04:10
But what happens if you don’t pass anything? Well, in that case… Like here, I’ve created a defaultdict
but I didn’t pass any arguments to it at all, so I especially didn’t pass a first argument, which is a callable.
04:25 So let’s see what happens when I try to access a missing key here.
04:30
I’m going to try finding a key called 'missing key'
.
04:35
That gives me a traceback, and it’s a KeyError
traceback. This is exactly the same traceback which I would have gotten with a normal dictionary.
04:44
Notice that the same thing happens if you don’t leave this empty but just pass None
to it.
04:52
So if I now try to access a missing key, I get the same traceback. In order to avoid this, what I need to do is pass a callable. So, something like list
or str
(string). Let’s go with list
—ha, and of course you should not capitalize list
, like I did.
05:09
So if I just pass list
like this, now that’s working. I didn’t get an error. And if I try to access a missing key, you see that a list
was generated here.
05:21
If you think of the previous point where I showed you the attributes which are specific to defaultdict
, there were three of them, right? So .__copy__()
, .__missing__()
, and .default_factory
.
05:31
Now let me reset my empty defaultdict
here, and if I try to call it, you can see there’s nothing in it at all. So I also haven’t triggered the 'missing key'
yet.
05:43
But what happens if I try to get 'missing key'
—not with a normal key reference but by using .get()
, which is a normal dictionary method and it’s a method which is available to me here with defaultdict
.
05:59
But you see, nothing happens. And the reason for that is because only .__getitem__()
triggers defaultdict
, not .get()
or other methods. So in this case, nothing happened.
06:10
My defaultdict
is still empty.
06:14
And if I try to access 'missing key'
using the conventional path, then .default_factory()
is triggered and it populates it with an empty list.
06:23
The last thing I’d like to say here about .default_factory
is that this is an attribute of my defaultdict
. So I can inspect it just as I would any normal attribute, and you can see here it’s set to list
. And I can also update it, so I can set this to str
, and now if I try to access another missing key—so, I’ve already used this value but let’s call it 'missing key 2'
—then this time a str
is generated in place of a traceback and an empty list
, as I had in the previous example.
06:58
So, these are the three main things about .default_factory
. You need to set it to a callable. That callable is passed as the first argument when you’re creating a defaultdict
.
07:09
You can also pass it afterwards or even update it with the notation .default_factory
, like here.
07:17
If you leave it empty or set it to None
, then your defaultdict
behaves just like a normal dict
. And of course only .__getitem__()
triggers .default_factory()
, so other methods for getting key-value pairs from dictionaries—such as .get()
—won’t work. Okay.
07:34
Moving swiftly onwards, let’s compare defaultdict
with .setdefault()
. In one of the previous lessons I already mentioned .setdefault()
a bit, but it’s worth having a quick look at it here again.
07:46
I’m going to start by creating a normal empty dictionary. So, this is a dict
. There’s nothing in it. I can emulate what defaultdict
does by using .setdefault()
.
07:58
Imagine I am going to look for a key, and it’s a key which isn’t there. I can put anything here since this dictionary is completely empty. And then the second argument which .setdefault()
takes—and you can already see it down here—is default
. This is a default value which will be provided if the key is not in the dictionary.
08:20
Let’s enter an empty list
here, and you can see what I did was I called my dict
, I looked for the string 'a'
. It’s not there, so an empty list
was provided. This is exactly what defaultdict
would do.
08:35
And that kind of raises the question, “Why should I use one or the other since they seem to do the same thing?” Well, defaultdict
is arguably more readable, user-friendly, Pythonic, and straightforward.
08:48
If your code is heavily based on dictionaries and you’re dealing with missing keys all of the time—rather than as an occasional stumbling block—then you should really consider using defaultdict
rather than regular dict
.
08:59
If your dictionary items are having to be initialized with a constant default value, then defaultdict
also makes sense for you. And finally, if you’re using the dictionaries in your code for things such as aggregating, accumulating, counting, grouping—basically the use cases we saw in the previous lesson—then also defaultdict
is a good option. Regular dict
, as opposed to defaultdict
, does have a slight speed advantage but in most cases where your code is heavily reliant on dictionaries, the convenience of defaultdict
sort of outweighs that. Okay, so that brings us to the last item in this lesson.
09:35
The last thing I wanted to tell you about is .__missing__()
.
09:39
I touched on this briefly earlier, but the key thing to remember here is that when you look for a key in a dictionary, you’re triggering .__getitem__()
, which in turn triggers .__missing__()
, which in turn triggers .default_factory()
.
09:52
The important thing about .__missing__()
is that it’s only triggered by .__getitem__()
, so other methods which can be used to look for keys, such as .get()
or .__contains__()
, won’t be triggering .default_factory()
.
10:06
It’s less likely that .__contains__()
will trip you up, but if you’re used to using .get()
to look for something, then you do have to be aware that there is this potential trap that you can fall into if you’re expecting it to trigger .default_factory()
and allow your defaultdict
to work as such. Okay, so that was it for this lesson. In the next lesson, I’ll be telling you about different ways in which you can pass arguments to defaultdict
. I’ll see you there!
Bartosz Zaczyński RP Team on Nov. 12, 2020
@Dan B The method, which is part of Python’s data model, is named .__getitem__()
. Typically, iterable types implement it to return elements at the given index or key in the collection.
Dan B on Nov. 12, 2020
Ah… Can you update the slides? I didn’t know it was named .__getitem__()
when you write it .get_item()
! e.g. at 9:55
Chris Bailey RP Team on Nov. 23, 2020
Hi @Dan B and Bartosz, I have fixed the slide and posted an updated video. Thanks for spotting the error.
Become a Member to join the conversation.
Dan B on Nov. 10, 2020
Am I misunderstanding what
.get_item()
is? I don’t see that attribute of dict or defaultdict.