Using defaultdict

Handling Missing Keys With the Python defaultdict Type Christian Mondorf 09:20

Transcript
Discussion

00:00 Welcome back. In this lesson, I’m going to be showing you how to use defaultdict. How exactly you’re going to use it is going to depend on the exact use case, but a general pattern here is that we’re going to be using built-in mutable data structures which Python offers, and we’re going to be mapping those to keys.

00:17 Which data structure we use is going to depend on our use case, and there are four of those which we’re going to be looking at.

00:23 The first one is grouping items. Next, we’ll look at grouping unique items. After that, we’ll look at counting items, and finally, at accumulating items.

00:33 This all sounds a bit abstract, but it’s much clearer once we start looking at the code, so let’s jump right in. Here we are in the REPL, and the very first thing I’m going to do is I’m going to import defaultdict from collections.

00:45 So now, I’m able to create defaultdict, and that’s what I’ll do. I’m going to create a defaultdict and I’ll call it dd and I’ll pass it list.

00:55 So what this allows me to do is that I can append something to a key which isn’t present, such as the key 'key'. And as you can see, this didn’t cause my code to error out. In fact, if I have a look at my dictionary, you can see that there’s a single 'key' and the value is 1.

01:13 Now what happens if I add another value, such as 2? Whoops.

01:20 Well, there we go. That didn’t go as smoothly as I planned, but we can have a look at the defaultdict and see what it contains. And you can see that what happened was that the value 2 was smoothly appended to this list. Again, no errors, no issues of any kind.

01:35 This is the basic logic which we’re going to be applying in each of the scenarios which we just discussed. Let’s look at counting items. First of all, I’m going to clear my REPL.

01:45 But remember, I’ve already imported defaultdict, so I don’t need to do that again. Now, to set up my example, I’m going to be working off this data.

01:54 So basically what it is is a list of tuples, and each tuple is a department and an employee. So in the Sales department, we have John Doe and Martin Smith. In Marketing, we have Elizabeth Smith and Adam Doe. And Jane Doe works all alone in Accounting.

02:11 What I want to do here is I want to create a dictionary which will group people by department. So for example, I would like to have one key for 'Marketing', and then I would like to have 'Elizabeth Smith' and 'Adam Doe' both listed in a list under that key, since they both work in Marketing, right?

02:29 But we can imagine that this is a huge company—there are many departments, and I might not know all of them from the beginning. So now I’ve created a new defaultdict. This time it’s called dep_dd.

02:41 So now that I’ve done that setup, what I can do is I can iterate over each department and each employee in dep—so in my list of departments and employees—and I can append each employee to a list which is mapped to a key which is the department.

02:59 And I don’t have to worry about the department keys already being present, so I won’t get an error even when I’m adding an employee to a department for the first time.

03:08 And since I’m using list as .default_factory, I can always add employees to departments. It’s not one department, one employee. So let’s try this.

03:20 As you can see, that ran without any issues. Let’s try looking at the defaultdict which I just created. And you can see here, for instance, that 'Sales' has two employees and indeed, they’re both in my list—'John Doe' and 'Martin Smith' are both there.

03:35 So that works as expected.

03:39 The next use case I would like to show you is grouping unique items.

03:43 Let’s come back to the REPL. First of all, I’m going to clear it so that we have sort of a blank slate, and I’m going to create my dep list again. Except this time, the data isn’t as clean. In fact, it’s quite messy in the sense that I have multiple entries for the same values.

04:01 So for example, if you look at the last three values, it’s three times 'Adam Doe' in the 'Marketing' department. And this is a very common situation, right? We’re often working with dirty data, which is not optimally presented to us.

04:14 What I want to create is a dictionary-like structure, so a defaultdict, which only has one entry for 'Adam Doe', one entry for 'Elizabeth Smith', and so on.

04:24 The way to do this is very similar to what we did previously. In the example just before this one, I created a defaultdict. I called it dep_dd, just like here. Instead of passing set as a parameter to .default_factory, I had passed list. What I’m going to do now is pass set.

04:42 And what that does is set accepts only one of each value. So if I pass the same value again to set, it won’t be entered again, since it’s already present.

04:53 So the syntax for doing this is very similar to what I had done just before. Again, I’m iterating over my tuples, over department and employee, and I’m this time adding—rather than appending them—to keys which are the department values. But this time the mutable data structure which I’m using is a set instead of a list.

05:12 So rather than prolonging the list with repetitive values, the set will only accept unique values. Let’s see how this went. I’ll have a look at my defaultdict to see what it contains, and you can see in 'Sales', I have 'Martin Smith', 'John Doe'.

05:29 But what’s more interesting is that in 'Marketing', I have 'Adam Doe' and 'Elizabeth Smith', and 'Adam Doe' only appears once even though in my original dataset up here, we had 'Adam Doe' three times.

05:42 The next use case we’re going to be looking at is counting items.

05:47 For this example, I’m going to be using the same list I used in the very first example so there are no repetitions, just because it’s a bit cleaner and easier to work with. If you thought I was going to start out by creating a defaultdict, as I did in the previous examples, you would be right.

06:04 And if you thought that this example would be different in that I would pass a different mutable data structure to defaultdict, then that would be correct as well.

06:14 What I will pass this time is int instead of set and instead of list.

06:20 So there we go. I’ve created my defaultdict. Next, I will iterate over my list of tuples. And now what I’ll be doing is I’m going to be incrementing the int, which I’ve added to each entry where I didn’t have a key in my defaultdict.

06:36 And again, as in previous examples, the key is the department name. Okay, so we ran this code. Let’s have a look at what my defaultdict contains.

06:46 As you can see, this time there is an int mapped to each key. The key is a department name. So the first one is 'Sales' and the value I have here is 2, and that’s because two people work in 'Sales'.

06:59 That’s 'John Doe' and 'Martin Smith'.

07:03 We’ve reached the final use case, and that’s accumulating items. Again, this’ll be easier to understand when we’re looking at the REPL. In this use case, we’re going to be using this data.

07:15 What we have here is a series of departments again—or you can imagine these are sales types—and we have a value for each of them. So for instance, here at the very top line, you can see that we’ve sold three types of books, or we have three entries for sales in 'Books', or this could be perhaps spent on books.

07:34 But the point is we have different numerical values here, 1250.00, 1300.00, 1420.00, and so on. And what I’d like to do is I’d like to add them.

07:43 I would like to end up with a dictionary-like structure, or a defaultdict, where I have one entry for 'Books' and the sum of these values, so I have consolidated totals.

07:56 And by now you’re probably expecting me to create a defaultdict and pass a different argument to it, and that’s correct. This time, I’m using a float.

08:06 And as in the previous use cases, what I’m going to be doing is I’m going to be iterating over this list of tuples, and I’m going to be accumulating values. This time, I’m going to be using products as keys.

08:19 Now this should have run, and the best way to see what we came up with is to print this. And that’s exactly what I’m going to be doing here. Let’s see what this gives us.

08:30 And so you can see that the income for books was just under $4,000, for tutorials just under $2,000, and so on.

08:40 That’s the end of this lesson. We looked at four different use cases and how defaultdict can help us in each of them. In each case, we are grouping or consolidating or somehow reducing items that we have, maybe to unique items, and we’re mapping them in different categories.

08:57 We’re using keys as the way in which we can retrieve these consolidated values. So, I’ve shown you four use cases. I hope I’ve convinced you that defaultdict can be helpful and useful in resolving concrete problems. In the next lesson, we’ll go deeper into defaultdict and see more of how they work under the hood.

09:17 See you there!

Become a Member to join the conversation.