Using itemgetter() to Improve Performance
00:00 Let’s conclude this course with some performance and design considerations for when you have dictionaries that you want to sort. Consider this dictionary.
00:09 It’s got 15 items. The keys are integers from 1 to 15 in this case, and the values are strings with names of Python modules. Let’s say you want to sort this dictionary based on the values so that the Python modules are in alphabetical order.
00:26
You’ve already seen how to do this earlier in this course. You can call the sorted()
built-in function. You pass not quite the dictionary itself, because if you pass the dictionary, sorted()
will return an ordered list of keys, which, in this case, are number 1 to 15.
00:44
However, you can use the items view by calling .items()
. But then you also need to pass a second argument, which is assigned to the key
parameter.
00:55
You can either define the function with the rule to get the value, or you could use a lambda function directly within the function call. The lambda function takes parameter item
and returns item
[1]
.
01:09
Each item will be the tuple key, value. So 1, "requests"
for example. Therefore, item[1]
gives us back the value.
01:21
And that gives us a list of tuples ordered based on the string, which is the value. In this case. the string is the Python module. You can see black
is first, and so on and so forth.
01:37
The step of getting a value from a tuple is a common one, and Python has a function for this. From the operator
module, which is one of the standard-library modules, you can import itemgetter()
, which will allow you to efficiently get an item from a data structure such as a tuple.
01:55
To see how this works, you can create a variable called get_second
and you assign to it itemgetter(1)
. 1
is the index, which means we want the second item.
02:11
And get_second
is itself callable. To see how it works, you can call get_second
with, for example, a list [3, 19]
02:21
and get_second
returns 19
. It returns the second item or the item with index 1
. So itemgetter()
, when you call it with an index, returns a callable that you can use in places such as sorted()
.
02:38
So if you go back to the call to sorted()
, but instead of the lambda function, you can now use itemgetter(1)
or get_second
, which is the same thing.
02:49
But if you hadn’t defined get_second
, which you don’t need to, you can call itemgetter(1)
which returns the callable that you can use as an argument to the key
parameter.
03:01 And this call returns the same list of tuples as the one you had with the lambda function earlier.
03:08
So what’s the difference between the two? Apart from using a different function, a key difference is in efficiency because itemgetter()
is a very efficient way of getting an item from, in this case, the tuples you get from .items()
.
03:23
To confirm this, you can use the timeit
module, and from timeit
you can import the function with the same name as often happens in Python.
03:31
And this function will allow you to time the execution of certain statements. The statements you use in timeit
need to be strings. So you can go back to the calls such as the call to sorted()
using the lambda function.
03:48
And you can convert this call into a string, which you’ll need to pass to the timeit
module. You could call this sorted_with_lambda
, for example,
04:00
and you can do the same thing with the other sorted()
call, the one that uses itemgetter()
. And you could call this sorted_with_itemgetter()
to make it clear which is which and that is also a string.
04:16
So now you have the two statements you need as strings. You can use timeit
and timeit
requires, first of all, the statement. We can start with sorted_with_lambda
.
04:31
You also need to pass the global variables to the globals
parameter, so that timeit
has access to the global variables.
04:42 Let’s see how long this call takes on my particular computer. Of course, timing will vary depending on what computer, what setup, and what system you have. 1.17 and a bit seconds.
04:55
You can now repeat the process with sorted()
with itemgetter()
. Since itemgetter()
requires an import, you can also set up “from_operator_import_itemgetter”
05:12
and the globals
as we had before,
05:18
and this takes 0.87 seconds. Again, timing will vary depending on your particular setup. That’s about 35% faster, which in some situations can matter. So if you have an application where that type of performance improvement matters, using itemgetter()
is the preferred way to solve a problem such as this one.
Become a Member to join the conversation.