Welcome back. In this lesson, I’m going to be showing you how to use
defaultdict. How exactly you’re going to use it is going to depend on the exact use case, but a general pattern here is that we’re going to be using built-in mutable data structures which Python offers, and we’re going to be mapping those to keys.
00:17 Which data structure we use is going to depend on our use case, and there are four of those which we’re going to be looking at.
00:23 The first one is grouping items. Next, we’ll look at grouping unique items. After that, we’ll look at counting items, and finally, at accumulating items.
This all sounds a bit abstract, but it’s much clearer once we start looking at the code, so let’s jump right in. Here we are in the REPL, and the very first thing I’m going to do is I’m going to import
So now, I’m able to create
defaultdict, and that’s what I’ll do. I’m going to create a
defaultdict and I’ll call it
dd and I’ll pass it
So what this allows me to do is that I can append something to a key which isn’t present, such as the key
'key'. And as you can see, this didn’t cause my code to error out. In fact, if I have a look at my dictionary, you can see that there’s a single
'key' and the value is
Now what happens if I add another value, such as
Well, there we go. That didn’t go as smoothly as I planned, but we can have a look at the
defaultdict and see what it contains. And you can see that what happened was that the value
2 was smoothly appended to this list. Again, no errors, no issues of any kind.
01:35 This is the basic logic which we’re going to be applying in each of the scenarios which we just discussed. Let’s look at counting items. First of all, I’m going to clear my REPL.
But remember, I’ve already imported
defaultdict, so I don’t need to do that again. Now, to set up my example, I’m going to be working off this data.
01:54 So basically what it is is a list of tuples, and each tuple is a department and an employee. So in the Sales department, we have John Doe and Martin Smith. In Marketing, we have Elizabeth Smith and Adam Doe. And Jane Doe works all alone in Accounting.
What I want to do here is I want to create a dictionary which will group people by department. So for example, I would like to have one key for
'Marketing', and then I would like to have
'Elizabeth Smith' and
'Adam Doe' both listed in a list under that key, since they both work in Marketing, right?
But we can imagine that this is a huge company—there are many departments, and I might not know all of them from the beginning. So now I’ve created a new
defaultdict. This time it’s called
So now that I’ve done that setup, what I can do is I can iterate over each
department and each
in dep—so in my list of departments and employees—and I can append each
employee to a list which is mapped to a key which is the
And I don’t have to worry about the
department keys already being present, so I won’t get an error even when I’m adding an
employee to a
department for the first time.
And since I’m using
.default_factory, I can always add employees to departments. It’s not one department, one employee. So let’s try this.
As you can see, that ran without any issues. Let’s try looking at the
defaultdict which I just created. And you can see here, for instance, that
'Sales' has two employees and indeed, they’re both in my list—
'John Doe' and
'Martin Smith' are both there.
03:35 So that works as expected.
03:39 The next use case I would like to show you is grouping unique items.
Let’s come back to the REPL. First of all, I’m going to clear it so that we have sort of a blank slate, and I’m going to create my
dep list again. Except this time, the data isn’t as clean. In fact, it’s quite messy in the sense that I have multiple entries for the same values.
So for example, if you look at the last three values, it’s three times
'Adam Doe' in the
'Marketing' department. And this is a very common situation, right? We’re often working with dirty data, which is not optimally presented to us.
What I want to create is a dictionary-like structure, so a
defaultdict, which only has one entry for
'Adam Doe', one entry for
'Elizabeth Smith', and so on.
The way to do this is very similar to what we did previously. In the example just before this one, I created a
defaultdict. I called it
dep_dd, just like here. Instead of passing
set as a parameter to
.default_factory, I had passed
list. What I’m going to do now is pass
And what that does is
set accepts only one of each value. So if I pass the same value again to
set, it won’t be entered again, since it’s already present.
So the syntax for doing this is very similar to what I had done just before. Again, I’m iterating over my tuples, over
employee, and I’m this time adding—rather than appending them—to keys which are the
department values. But this time the mutable data structure which I’m using is a
set instead of a
So rather than prolonging the
list with repetitive values, the
set will only accept unique values. Let’s see how this went. I’ll have a look at my
defaultdict to see what it contains, and you can see in
'Sales', I have
But what’s more interesting is that in
'Marketing', I have
'Adam Doe' and
'Elizabeth Smith', and
'Adam Doe' only appears once even though in my original dataset up here, we had
'Adam Doe' three times.
05:42 The next use case we’re going to be looking at is counting items.
For this example, I’m going to be using the same list I used in the very first example so there are no repetitions, just because it’s a bit cleaner and easier to work with. If you thought I was going to start out by creating a
defaultdict, as I did in the previous examples, you would be right.
And if you thought that this example would be different in that I would pass a different mutable data structure to
defaultdict, then that would be correct as well.
What I will pass this time is
int instead of
set and instead of
So there we go. I’ve created my
defaultdict. Next, I will iterate over my list of tuples. And now what I’ll be doing is I’m going to be incrementing the
int, which I’ve added to each entry where I didn’t have a key in my
And again, as in previous examples, the key is the department name. Okay, so we ran this code. Let’s have a look at what my
As you can see, this time there is an
int mapped to each key. The key is a department name. So the first one is
'Sales' and the value I have here is
2, and that’s because two people work in
'John Doe' and
07:03 We’ve reached the final use case, and that’s accumulating items. Again, this’ll be easier to understand when we’re looking at the REPL. In this use case, we’re going to be using this data.
What we have here is a series of departments again—or you can imagine these are sales types—and we have a value for each of them. So for instance, here at the very top line, you can see that we’ve sold three types of books, or we have three entries for sales in
'Books', or this could be perhaps spent on books.
But the point is we have different numerical values here,
1420.00, and so on. And what I’d like to do is I’d like to add them.
I would like to end up with a dictionary-like structure, or a
defaultdict, where I have one entry for
'Books' and the sum of these values, so I have consolidated totals.
And by now you’re probably expecting me to create a
defaultdict and pass a different argument to it, and that’s correct. This time, I’m using a
08:06 And as in the previous use cases, what I’m going to be doing is I’m going to be iterating over this list of tuples, and I’m going to be accumulating values. This time, I’m going to be using products as keys.
08:19 Now this should have run, and the best way to see what we came up with is to print this. And that’s exactly what I’m going to be doing here. Let’s see what this gives us.
08:30 And so you can see that the income for books was just under $4,000, for tutorials just under $2,000, and so on.
That’s the end of this lesson. We looked at four different use cases and how
defaultdict can help us in each of them. In each case, we are grouping or consolidating or somehow reducing items that we have, maybe to unique items, and we’re mapping them in different categories.
We’re using keys as the way in which we can retrieve these consolidated values. So, I’ve shown you four use cases. I hope I’ve convinced you that
defaultdict can be helpful and useful in resolving concrete problems. In the next lesson, we’ll go deeper into
defaultdict and see more of how they work under the hood.
Become a Member to join the conversation.