Mapping Trick for Membership Binning
This lesson reveils you a mapping trick for membership binning. Let’s have a look at a simple situation were this can be useful.
Assume you have a Series and a corresponding “mapping table” where each value belongs to a multi-member group, or to no groups at all:
>>> countries = pd.Series([
... 'United States',
... 'Canada',
... 'Mexico',
... 'Belgium',
... 'United Kingdom',
... 'Thailand'
... ])
...
>>> groups = {
... 'North America': ('United States', 'Canada', 'Mexico', 'Greenland'),
... 'Europe': ('France', 'Germany', 'United Kingdom', 'Belgium')
... }
In other words, you need to map countries
to the following result:
0 North America
1 North America
2 North America
3 Europe
4 Europe
5 other
dtype: object
00:00
Now it’s time to learn a little trick for membership binning categorical data. Let’s say you have a Series like this, a number of countries, that you need to map to a mapping table, like this, where each of the countries either belongs to an item in here or doesn’t belong anywhere. You basically need a function that’s similar to Pandas’ cut()
method but bins values based on categories instead of numbers.
00:26
We’re going to build on the Series.map()
method that you used in the categorical data video to do this. So before we get started, let me just copy all this,
00:38 and open up the terminal.
00:44
Start the Python interpreter, import pandas as pd
, and then paste in those two dictionaries. Before you get to writing the function, from typing import Any
so that you can put some type control into the parameters of the function.
01:01
Now define a new function called membership_map()
,
01:06
which is going to take in a couple parameters. So, s
will be a Series
, groups
will be a dict
.
01:19
fillvalue
, for something that’s not found in the mapping table, can be anything, and set the default value to -1
. And this function is going to return a Series
.
01:34
Like before, make a dictionary called groups
, and we’re going to use a pretty large dictionary comprehension here, {x: k for k, v in groups.items() for x in v}
.
01:58
And then just return s.map(groups)
, and then fill in any values that are not found with the fillvalue
, like that. So before you use this, try to think about what’s going on in this dictionary comprehension. So let me scroll up.
02:19
groups
is going to be that mapping dictionary, like this right here. When you call groups.items()
, you’re going to return the key and then the value. So in this case, the key would be 'Europe'
and then the value would be this entire tuple here.
02:36
Pandas’ .map()
method is not going to go inside this tuple, so you needed to break it out further, and that’s where this second loop for x in v
comes into play.
02:45
So if this is v
, now x
represents each of these countries. And that’s how you get the final dictionary, so for each country it will then return the k
continent.
02:58
If that’s still confusing, feel free to write this out as two nested for
loops to try to understand what’s going on. Okay. Time to see if this works.
03:08
So call membership_map()
,
03:12
and now you’re going to pass in s
, which is countries
in this case. groups
will be groups
, or the mapping dictionary.
03:21
And then the fillvalue
for anything not found in that mapping dictionary, you can just say 'other'
. And there you go. The first three countries were from North America, the next two were from Europe, and 'Thailand'
wasn’t in either of those, so it returned 'other'
. This was a small dataset, so it wasn’t very noticeable, but by using this dictionary comprehension and then mapping those values, this would be a lot faster than if you were to run through this with nested for
loops. This is a nice use of a dictionary for mapping, which, while it’s helpful with Pandas, is useful in a lot of other situations in Python. So, that’s it!
03:58 You’ve learned how to write a pretty useful function that you can use to map datasets to categorical bins in a very quick and concise way. Thanks for watching.
arcarlos00 on Sept. 2, 2021
Thanks for the tuto. I understand that it fills some entries with the string ‘Other’, but why do we need to do
from typing import Any
at all?
Bartosz Zaczyński RP Team on Sept. 2, 2021
@arcarlos00 Strictly speaking, you don’t need them. These are called annotations or type hints, which Python completely ignores. Some tools and libraries can leverage those type hints, for example, to improve auto-completion and type checking in your code editor. However, in this case, they seem to only serve as documentation for the reader.
Become a Member to join the conversation.
raulfz on May 6, 2021
Thank you for your Tutorial. The accessor methods and groupby iteration are really great tricks.
However, about this mapping trick I don’t really see any benefit over this:
Best,
R.