Sorting More Complex Dictionaries Using Custom Rules
00:00
Let’s have a look at another example. Here’s a dictionary data
, which has numeric keys, and the values are nested dictionaries with the names, the ages, and the skills for employees in a company let’s say. The skills
key, its value is another dictionary with the scores for Python, JavaScript, or other languages that these employees have.
00:25
You would like to sort this dictionary based on the combined Python and JavaScript skills. You can use data.items()
to get items view for this dictionary. Each item will be the number as a key, for example, 1, 9, 3 in the first one and the second value in each item’s tuple will be the dictionary with the keys, name, age, and skills.
00:52
Before you use that and sort it, you need to define the rule you would like to use. So you can define a function, you can call it get_relevant_skills()
which takes an argument, let’s call it parameter item
.
01:07
The first thing you need to do is to extract the skills
, the sub-dictionary, which has the scores for Python and JavaScript or any other languages.
01:16
So from item
, which is a tuple, the first item in this tuple is the key, but you want the value. That’s a dictionary with name, age, and skills as keys, and you only want the value for skills
.
01:32
And that will be the sub-dictionary with Python, JavaScript, Java, C, or any other language included in the skills
sub dictionary.
01:42
Next, you want to return and you would like to return the sum of the Python and JavaScript scores. So from the sub-dictionary skills
, you can get
01:56
the Python value and put the default value of zero so if Python is not there, the .get()
method will return zero as is the default value, and that’s Python.
02:07
However, you also want to get the value for the JavaScript component if it is there. So get the value for js
, and if it’s not there, the default value should be zero.
02:21
Now you’re ready to call sorted()
and you don’t want to use the dictionary data
instead, you’d like the items view using the method items()
.
02:31
However, you also want to use the key
parameter and the value the argument you pass the key
is the helper function you’ve just defined, get_relevant_skills()
,
02:44
and we might also want to use reverse=True
. Why? Because you might want the person with the highest combined value for Python and JavaScript to be first.
02:55 That gives us the output. You’ll make this output look prettier and more importantly, more readable shortly in this lesson. But first, let’s review where we are so far.
03:08
You started with the dictionary data
. Then, you chose to use the items view by calling data.items()
, which you pass as the first argument to sorted()
. sorted()
also has the second argument you pass, get_relevant_
skills()
, the name of the function you’ve just defined as the key
argument.
03:27 This function returns the combined Python and JavaScript scores for the employees.
03:33
reverse=True
is the final argument in sorted()
. It ensures that those with the highest combined scores are at the top.
03:42
sorted()
always returns a list. Since the argument is data.items()
sorted()
returns a list of tuples. If you want a dictionary instead, you can cast whatever comes back from sorted()
, which is the list of tuples, into a dictionary,
04:03
and you can assign this to a new variable. You can call it sorted_data
.
04:10
And sorted_data
is now a dictionary.
04:14 Still not easy to read. We’ll fix this shortly. The first item in this dictionary has the key 1, 9, 3 and the values, the dictionary that shows this John, age 30, his Python skills are 8 and his JavaScript skills are rated 7.
04:30 And if you go through all of the others, add the Python and JavaScript scores, you’ll see that John comes on top and they’re listed in order of their combined Python and JavaScript scores.
04:43
So this works. You have a sorted dictionary from the unsorted dictionary you started with. Let’s try to display this in a slightly easier way to read. And for this, you can rely on the pprint
module in Python, which is part of the standard library.
05:00
And from it, you can import a function also called pprint()
. This is a stand-in for the built-in print()
function that we all know very well.
05:10
And if you pretty print, so pprint
stands for pretty print because the output, as you see, is much prettier. That’s our sorted data. And now you might look at those names and think that’s not the same order we had earlier when we had the not-so-pretty display.
05:29
And the reason is that the pprint
module by default decides to sort the dictionary based on the keys. Notice the keys are in order. 1, 0, 9 is first and 1, 9, 3 and 2, 0, 9.
05:43
pprint
has always worked like this and for backward compatibility, it still does. Why? Because dictionaries aren’t inherently ordered data structures.
05:53
However, in this case, if you want your dictionary to be sorted, pprint
allows you to turn off the reordering it does, by setting the sort_dicts
parameter, which by default is True
, you want to set this to False
.
06:12
And there pprint
shows you the dictionary printed in an easy-to-read way with the original order where John is first, followed by Anna, who only has JavaScript, we’ll have to help her with her Python, but her JavaScript is quite good so she comes in second with a score of 10, followed by Penelope the ‘go’ score is not relevant in this case it’s only her Python score that matters, and so on.
06:36 You can go through and make sure that the order is correct.
Become a Member to join the conversation.
rs21763 on Dec. 20, 2024
data_dict = { 193: {“name”: “John”, “age”: 30, “skills”: {“python”: 8, “js”: 7}}, 209: {“name”: “Bill”, “age”: 15, “skills”: {“python”: 6}}, 746: {“name”: “Jane”, “age”: 58, “skills”: {“js”: 2, “python”: 5}}, 109: {“name”: “Jill”, “age”: 83, “skills”: {“java”: 10}}, 984: {“name”: “Jack”, “age”: 28, “skills”: {“c”: 8, “assembly”: 7}}, 765: {“name”: “Penelope”, “age”: 76, “skills”: {“python”: 8, “go”: 5}}, 598: {“name”: “Sylvia”, “age”: 62, “skills”: {“bash”: 8, “java”: 7}}, 483: {“name”: “Anna”, “age”: 24, “skills”: {“js”: 10}}, 277: {“name”: “Beatriz”, “age”: 26, “skills”: {“python”: 2, “js”: 4}} }
print(data_dict[1][‘skills’])
When I try to print the nested item I get a type error 1 i.e ‘int’ object is not subscriptable. What am I doing wrong here ?