Why Use the reduce() Function?
Now that you know the basics of how the reduce()
function works, you’re going to look into some of the interesting ways in which you can use this building block of functional programming. You’ll see how the reduce()
function allows you to group your data set into arbitrary categories.
You’ll also learn about Python’s defaultdict
class, which is defined in the collections
module. Next, you’ll familiarize yourself with some useful helpers in the itertools
module, such as itertools.groupby
.
00:00
At this point, you might be wondering, “Okay, why do I even need to use the reduce()
function in this case at all?” And the reason is that the reduce()
function—it can actually go far beyond what you’ve seen here.
00:13
We can talk about some pretty crazy examples here. Why don’t we look at some more interesting uses of the reduce()
function? All right, so one interesting thing that I thought we could do with the reduce()
function is grouping scientists by field.
00:28 Basically, I want to have some output like this. I want to fill this up—this dictionary here—and I want to populate that with the scientists grouped by their respective field.
00:41
We can do that by creatively using the accumulator field. All right, so what we’re going to do here is we’re going to define a function that we can pass to the reduce()
function that will take our existing list of scientists and
00:57
assign it to these fields and group people that way. The way we’re going to do it—there is actually a way you can do this in a single lambda
expression, but I just started recording that example and I was like, “Man, this actually really detracts from what I want to show you here.” So here, I’m just going to define my reducer()
function and it’s going to take the
01:23
accumulator and a value. I’m just going to say, okay, in the accumulator, well, you want to look at the .field
that the scientist belongs to and then we’re just going to append the .name
of that particular scientist.
01:41
And then, of course, we need to return the accumulator. You might be wondering, “What is that? What does that do?” You’re going to see that in a minute. So here, I’m now typing out the reduce()
function call, so I need to pass my reducer()
function, and I need to pass my scientists
, and then I need to initialize the accumulator. And so here, this is where this thing comes in because we need to make sure that this dictionary here actually has slots, or has these keys in here for all the different fields, so that the reducer()
can go over it and it can update the accumulator according to the individual field that each of these scientists belongs to. All right. So now, once I’ve run this…
02:28
let’s pretty-print it out. And yeah, you can see here it worked! So, for 'astronomy'
, you’ve got 'Vera Rubin'
, 'chemistry'
, 'math'
, and 'physics'
—they’re all kind of sorted that way and assigned to the right categories.
02:45
And I hope you can see how this worked here with this reducer()
function, and the main thing you need to figure out is how—what I was struggling with the most was how these names like accumulator and value, how they correspond to actual items and what the accumulator looks like as it gets updated. Right?
03:05 That’s like, the biggest thing you need to wrap your head around. It makes sense to play through this, maybe even on a piece of paper or certainly here in the Python REPL.
03:13
One thing that I really don’t like about this is that I have to give it a list of categories upfront that then gets populated. That’s kind of stupid. I mean, if I make a mistake, you know, I have a typo here, then it’s just all going to blow up. But there is a better way to do it, and that is the defaultdict
class in the collections
module.
03:37
So, I’m going to kind of have to import collections
and what we’re going to do now—I’m going to override this scientist_by_field
object here, again.
03:47
I’m going to pass it my reducer()
function, I’m going to pass it the scientists
, and instead of initializing the accumulator, I’m going to go collections.defaultdict(list)
.
04:02
Then, what that will do—I’ll show you that in a minute. What that will do is it will lead to the same result because this collections.defaultlist()
thing—or, .defaultdict()
thing is pretty magical.
04:19
So, let’s just create an instance of that class, so, this is a defaultdict
. And now this thing, every time you access a key that doesn’t exist,
04:31 it will be created and it will be populated with whatever you pass in here, whatever factory function you pass in here. Right? So now, I can go
04:43 and I can put all these crazy keys here and the dictionary will be updated. So, now I can do stuff like
04:54
dd['xyz'].append()
and this will work
04:59 because it’s going to create that slot for me just in time. And I can keep doing that and it will keep updating because now from the second
05:13
call on, it knows that it has this field in there, in the defaultdict
, and it doesn’t need to recreate it. So, this is a little trick you can use to get around having to manually define the accumulator here. But of course, it also adds more complication to this function call and I’ll bet you this would actually take many people a while to figure out and to read that, and so that’s actually a pretty good reason to not use that in production code.
05:41 However, I think it makes for a really interesting thought exercise in Python and I think it’s pretty cool to work with these different kinds of programming paradigms and programming styles, like functional programming, object-oriented programming, and more, like procedural programming.
05:59 And it makes sense to be comfortable with these different styles, because even if you don’t always stick to any one style, you know, 100% of the time, you’re going to learn a lot about when these things have their strength and when you should apply them. I think that is really valuable and that’s going to put you above and beyond what most people do when they learn programming, right? So, I think there’s a lot of value in exploring these things.
angelojwillems on March 28, 2020
Waw, I use the Counter function from collections as a means to the same end. Didn’t realize there was a defaultdict. From now on I’ll just use that instead.
darth88vader88 on April 1, 2020
I get this error with bpython console:
>>> total_age = reduce(lambda acc,val: acc + val['age'],names_and_ages,0)
Traceback (most recent call last):
File "<input>", line 1, in <module>
total_age = reduce(lambda acc,val: acc + val['age'],names_and_ages,0)
NameError: name 'reduce' is not defined
cellist on April 1, 2020
If you are on Python 3 you have to do a
from functools import reduce # only in Python 3
first
darth88vader88 on April 2, 2020
Thanks!
darth88vader88 on April 2, 2020
Hahaha! I missed that in the first video on reduce(). Dumb!
jignashreddy on April 3, 2020
There was an error when I used the same in lambda function. I’m not sure why?
lambda x,y:x[y.field].append(y.name)
The error is
TypeError: 'NoneType' object is not subscriptable
Zarata on May 7, 2020
In production, how do people remember that the argument to a function such as “reducer” takes an argument “map” and that the map values are list? Since Python isn’t strongly typed (is that a correct statement?) such functions will break if they aren’t called with arguments of correct type. Just something I realized / wondered as I thought about what’s going on in “reducer”, trying to wrap my head around it (your words, video midpoint).
Varun Vaddiparty on May 10, 2020
@jignash you get that error when x is None. Most likely you are not passing in the optional argument of reduce function.
Ivan Smalzer on June 8, 2020
defaultdict is one of the things that makes Python cool.
Another way to handle expanding accumulator would be explicitly check val.fields in reducer for each object and create new key if it doesn’t exist.
Peter on July 3, 2020
Couldn’t the same thing be achieved with just specifying an empty dictionary as the starting value.
Something like:
def group_scientists(acc, val):
if val.field not in acc:
acc[val.field] = {}
acc[val.field].append(val.name)
return acc
Granted defaultdict
is somewhat simpler.
Dan Bader RP Team on Aug. 12, 2020
Couldn’t the same thing be achieved with just specifying an empty dictionary as the starting value.
Yep…
Granted defaultdict is somewhat simpler.
Exactly :)
Dan Bader RP Team on Aug. 12, 2020
I wanted to address the 'NoneType' object is not subscriptable
error that some of you were seeing when trying to re-implement the reducer()
function as a lambda
function.
Let’s assume the following setup code:
>>> import collections
>>> Scientist = collections.namedtuple('Scientist', [
... 'name',
... 'field',
... 'born',
... 'nobel',
... ])
>>> scientists = (
... Scientist(name='Ada Lovelace', field='math', born=1815, nobel=False),
... Scientist(name='Emmy Noether', field='math', born=1882, nobel=False),
... Scientist(name='Marie Curie', field='math', born=1867, nobel=True),
... Scientist(name='Tu Youyou', field='physics', born=1930, nobel=True),
... Scientist(name='Ada Yonath', field='chemistry', born=1939, nobel=True),
... Scientist(name='Vera Rubin', field='chemistry', born=1928, nobel=False),
... Scientist(name='Sally Ride', field='physics', born=1951, nobel=False),
... )
>>> from functools import reduce
>>> from collections import defaultdict
Now, when I try to replace the reducer()
function in the video with a lambda
it’s easy to get a TypeError: 'NoneType' object is not subscriptable
error:
>>> reduce(lambda acc, val: acc[val.field].append(val.name), scientists, defaultdict(list))
Traceback (most recent call last):
File "<input>", line 1, in <module>
reduce(lambda acc, val: acc[val.field].append(val.name), scientists, defaultdict(list))
File "<input>", line 1, in <lambda>
reduce(lambda acc, val: acc[val.field].append(val.name), scientists, defaultdict(list))
TypeError: 'NoneType' object is not subscriptable
Let’s take a look at reducer()
again:
def reducer(acc, val):
acc[val.field].append(val.name)
return acc
Notice the return acc
statement at the end. That’s the problem: the lambda version has the wrong a return value, meaning it implicitly returns None
, leading to the TypeError
.
So, how do we fix that?
The challenge with Python lambdas is that they’re single-expression functions, meaning they can’t include return
statements:
>>> reduce(lambda acc, val: acc[val.field].append(val.name); acc, scientists, defaultdict(list))
File "<input>", line 1
reduce(lambda acc, val: acc[val.field].append(val.name); acc, scientists, defaultdict(list))
^
SyntaxError: invalid syntax
>>> reduce(lambda acc, val: acc[val.field].append(val.name) return acc, scientists, defaultdict(list))
File "<input>", line 1
reduce(lambda acc, val: acc[val.field].append(val.name) return acc, scientists, defaultdict(list))
^
SyntaxError: invalid syntax
(For more info about lambdas in Python, check out this course: realpython.com/courses/python-lambda-functions/)
However, you can kind of “cheat” your way around this and take advantage of how Python evaluates or
expressions:
>>> reduce(lambda acc, val: acc[val.field].append(val.name) or acc, scientists, defaultdict(list))
defaultdict(<class 'list'>, {'math': ['Ada Lovelace', 'Emmy Noether', 'Marie Curie'], 'physics': ['Tu Youyou', 'Sally Ride'], 'chemistry': ['Ada Yonath', 'Vera Rubin']})
That worked!
Here’s the lambda
version of reducer()
by itself:
lambda acc, val: acc[val.field].append(val.name) or acc
Notice the or acc
at the end.
Since acc
will be truthy, the expression acc[val.field].append(val.name) or acc
evaluates to the value of acc
.
And because lambda
functions consist of a single-expression and return the value of that expression at the end…we get the desired result:
A lambda
function that updates the accumulator variable (acc
) and then returns the updated accumulator (via or acc
).
At this point you can probably think of a few good reasons why using a standalone function (with def
) is usually the better way to go…
Imagine having to explain all of this stuff every time a new colleague starts working on this piece of code ;-)
Anyway, I hope this cleared up the confusion around the 'NoneType' object is not subscriptable
TypeError
that some of you were getting!
Happy Pythoning!
Brennan Barker on Nov. 30, 2020
In keeping with this course’s emphasis on immutability, I would suggest an implementation of reducer
that doesn’t rely on the list.append
mutator, or indeed modify acc
at all:
def reducer(acc, val):
return defaultdict(list, acc, **{val.field: acc[val.field] + [val.name]})
Alternatively, starting in Python 3.9 you could use the dictionary merge operator.
In addition to the benefits of testability and composability that functional programming touts, focusing on pure functions like the above instead of relying on side-effects also helps reduce a source of type of ‘NoneType’ object is not subscriptable errors Dan deals with in his last comment, because pure functions always have return values!
Andras on Jan. 12, 2025
Hi Dan,
Can you please add some explanation why you need to explicitly return acc
in your reducer
function? Since acc
is a mutable dictionary in your example, it would be updated/mutated even without the return
statement. I think I know the answer but I think this is an interesting bit that would deserve some attention.
Great course btw! Thank you.
Bartosz Zaczyński RP Team on Jan. 13, 2025
@Andras In general, relying on side effects is less explicit than passing values around. More importantly, however, the functools.reduce()
function calls your binary reducer, expecting that it returns the reduced value to be accumulated in each iteration. Had the reducer not returned anything, then it would return None
implicitly, which would then be passed into it later as the accumulator, causing an error since you can’t subscript None
.
Become a Member to join the conversation.
senatoduro8 on July 24, 2019
Dan, What is the difference between functional and procedural programming?