Join us and get access to hundreds of tutorials and a community of expert Pythonistas.

Unlock This Lesson

This lesson is for members only. Join us and get access to hundreds of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Hint: You can adjust the default video playback speed in your account settings.
Hint: You can set the default subtitles language in your account settings.
Sorry! Looks like there’s an issue with video playback 🙁 This might be due to a temporary outage or because of a configuration issue with your browser. Please see our video player troubleshooting guide to resolve the issue.

Diving Deeper Into defaultdict

00:00 Welcome back. In the last lesson, I showed you how you can use defaultdict to solve concrete problems in different scenarios. In this lesson, we’ll look a bit deeper into defaultdict and how it works under the hood.

00:11 I’m going to be telling you about four different things here. I’m going to contrast defaultdict with dictremember that defaultdict is a subclass of dict.

00:20 After that, we’ll look at .default_factory in a bit more detail. .default_factory is really the heart and engine of defaultdict.

00:27 Following that, we’ll contrast defaultdict with .setdefault(). Remember that .setdefault() is a method which works on dict right out of the box and allows you to address this missing key problem. And then finally, we will say a few words about .__missing__(). Okay, so let’s start by comparing defaultdict and dict.

00:46 Here we are back in the terminal. The first thing I’m going to do is I’m going to import defaultdict from collections.

00:56 There we go. Now I have access to defaultdict and all of its attributes and methods. To better contrast defaultdict and dict, I’m going to be using two methods, set() and dir().

01:07 I’ll start by typing out set(dir()), and in each case I’m going to be passing an object to them. First, I’ll use defaultdict without the opening parentheses since I’m not actually passing a value to this object.

01:25 And then I’ll do the same for dict—and don’t worry, I’m going to explain what I’m doing here in a minute. I have my code set up here. What dir() does is it returns a list of all the attributes and methods which an object has, and set() in turn boils down that list into a set of unique items.

01:43 So what I’m going to get back here is the attributes which are in defaultdict which are not in dict.

01:50 And they are '__missing__', 'default_factory', and '__copy__'. .__copy__() does what you would expect—it supports copying. .__missing__() gets called when .__getitem__() fails to find a key.

02:02 And here’s the interesting bit: when .__missing__() is called, it in turn triggers .default_factory(), which as I mentioned is really the heart of defaultdict.

02:10 We’re going to be talking a bit more about .default_factory in the next item in this lesson, but just before we move on, I would really like to rub in how similar defaultdict and dict are.

02:21 The first thing I’m going to do here is I’m going to create a standard vanilla dictionary and it has a few random elements in it.

02:32 So we can have a look at this. There are two lists, and they’re a list of numbers and the list of letters.

02:38 Next, I’ll create a defaultdict which is very similar to that dict. I’m passing the same inputs to it: a list of letters, a list of numbers.

02:47 The key difference is here. I am passing list as a callable, which of course gets passed on to .default_factory.

02:55 So let’s go ahead and create this and just have a look inside—so def_dict, and here you can see. This is a defaultdict and its contents are exactly the same, in fact, as in the dict I just created.

03:12 The key difference is that I have 'list' here as the callable. Let’s look at how similar these two dictionaries actually are. To do that, I’m going to take my std_dict (standard dict) and I’m going to check if it’s the same as def_dict. So here we go—and you can see this is True.

03:32 So these two dictionaries—or these two objects, rather, one of which is a dict and one of which is a defaultdict—are exactly the same. So as I mentioned, .default_factory is really the heart and soul of defaultdict. In fact, it’s what really sets defaultdict apart from dict.

03:50 Let’s have a look at how .default_factory works in a bit more detail. We’re back in the REPL, and keep in mind I’ve already imported defaultdict from collections, so I don’t need to do that again. So as mentioned earlier, .default_factory is set to a callable and the callable is the first argument that you pass when you’re creating a defaultdict.

04:10 But what happens if you don’t pass anything? Well, in that case… Like here, I’ve created a defaultdict but I didn’t pass any arguments to it at all, so I especially didn’t pass a first argument, which is a callable.

04:25 So let’s see what happens when I try to access a missing key here.

04:30 I’m going to try finding a key called 'missing key'.

04:35 That gives me a traceback, and it’s a KeyError traceback. This is exactly the same traceback which I would have gotten with a normal dictionary.

04:44 Notice that the same thing happens if you don’t leave this empty but just pass None to it.

04:52 So if I now try to access a missing key, I get the same traceback. In order to avoid this, what I need to do is pass a callable. So, something like list or str (string). Let’s go with listha, and of course you should not capitalize list, like I did.

05:09 So if I just pass list like this, now that’s working. I didn’t get an error. And if I try to access a missing key, you see that a list was generated here.

05:21 If you think of the previous point where I showed you the attributes which are specific to defaultdict, there were three of them, right? So .__copy__(), .__missing__(), and .default_factory.

05:31 Now let me reset my empty defaultdict here, and if I try to call it, you can see there’s nothing in it at all. So I also haven’t triggered the 'missing key' yet.

05:43 But what happens if I try to get 'missing key'not with a normal key reference but by using .get(), which is a normal dictionary method and it’s a method which is available to me here with defaultdict.

05:59 But you see, nothing happens. And the reason for that is because only .__getitem__() triggers defaultdict, not .get() or other methods. So in this case, nothing happened.

06:10 My defaultdict is still empty.

06:14 And if I try to access 'missing key' using the conventional path, then .default_factory() is triggered and it populates it with an empty list.

06:23 The last thing I’d like to say here about .default_factory is that this is an attribute of my defaultdict. So I can inspect it just as I would any normal attribute, and you can see here it’s set to list. And I can also update it, so I can set this to str, and now if I try to access another missing key—so, I’ve already used this value but let’s call it 'missing key 2'—then this time a str is generated in place of a traceback and an empty list, as I had in the previous example.

06:58 So, these are the three main things about .default_factory. You need to set it to a callable. That callable is passed as the first argument when you’re creating a defaultdict.

07:09 You can also pass it afterwards or even update it with the notation .default_factory, like here.

07:17 If you leave it empty or set it to None, then your defaultdict behaves just like a normal dict. And of course only .__getitem__() triggers .default_factory(), so other methods for getting key-value pairs from dictionaries—such as .get()—won’t work. Okay.

07:34 Moving swiftly onwards, let’s compare defaultdict with .setdefault(). In one of the previous lessons I already mentioned .setdefault() a bit, but it’s worth having a quick look at it here again.

07:46 I’m going to start by creating a normal empty dictionary. So, this is a dict. There’s nothing in it. I can emulate what defaultdict does by using .setdefault().

07:58 Imagine I am going to look for a key, and it’s a key which isn’t there. I can put anything here since this dictionary is completely empty. And then the second argument which .setdefault() takes—and you can already see it down here—is default. This is a default value which will be provided if the key is not in the dictionary.

08:20 Let’s enter an empty list here, and you can see what I did was I called my dict, I looked for the string 'a'. It’s not there, so an empty list was provided. This is exactly what defaultdict would do.

08:35 And that kind of raises the question, “Why should I use one or the other since they seem to do the same thing?” Well, defaultdict is arguably more readable, user-friendly, Pythonic, and straightforward.

08:48 If your code is heavily based on dictionaries and you’re dealing with missing keys all of the time—rather than as an occasional stumbling block—then you should really consider using defaultdict rather than regular dict.

08:59 If your dictionary items are having to be initialized with a constant default value, then defaultdict also makes sense for you. And finally, if you’re using the dictionaries in your code for things such as aggregating, accumulating, counting, grouping—basically the use cases we saw in the previous lesson—then also defaultdict is a good option. Regular dict, as opposed to defaultdict, does have a slight speed advantage but in most cases where your code is heavily reliant on dictionaries, the convenience of defaultdict sort of outweighs that. Okay, so that brings us to the last item in this lesson.

09:35 The last thing I wanted to tell you about is .__missing__().

09:39 I touched on this briefly earlier, but the key thing to remember here is that when you look for a key in a dictionary, you’re triggering .__getitem__(), which in turn triggers .__missing__(), which in turn triggers .default_factory().

09:52 The important thing about .__missing__() is that it’s only triggered by .__getitem__(), so other methods which can be used to look for keys, such as .get() or .__contains__(), won’t be triggering .default_factory().

10:06 It’s less likely that .__contains__() will trip you up, but if you’re used to using .get() to look for something, then you do have to be aware that there is this potential trap that you can fall into if you’re expecting it to trigger .default_factory() and allow your defaultdict to work as such. Okay, so that was it for this lesson. In the next lesson, I’ll be telling you about different ways in which you can pass arguments to defaultdict. I’ll see you there!

Dan B on Nov. 10, 2020

Am I misunderstanding what .get_item() is? I don’t see that attribute of dict or defaultdict.

Bartosz Zaczyński RP Team on Nov. 12, 2020

@Dan B The method, which is part of Python’s data model, is named .__getitem__(). Typically, iterable types implement it to return elements at the given index or key in the collection.

Dan B on Nov. 12, 2020

Ah… Can you update the slides? I didn’t know it was named .__getitem__() when you write it .get_item()! e.g. at 9:55

Chris Bailey RP Team on Nov. 23, 2020

Hi @Dan B and Bartosz, I have fixed the slide and posted an updated video. Thanks for spotting the error.

Become a Member to join the conversation.