Join us and get access to hundreds of tutorials and a community of expert Pythonistas.

Unlock This Lesson

This lesson is for members only. Join us and get access to hundreds of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Hint: You can adjust the default video playback speed in your account settings.
Sorry! Looks like there’s an issue with video playback 🙁 This might be due to a temporary outage or because of a configuration issue with your browser. Please see our video player troubleshooting guide to resolve the issue.

Grouping Data With itertools.groupby()

Give Feedback

Now that you know how to use the reduce() function and Python’s defaultdict class, which is defined in the collections module, it’s time to look at some useful helpers in the itertools module, such as itertools.groupby.

In the next section of this course, you’ll learn how to do parallel programming in Python using functional programming principles and the multiprocessing module. You’ll start by taking the example data set based on an immutable data structure that you previously transformed using the built-in map() function. But this time, you’ll process the data in parallel, across multiple CPU cores using the Python multiprocessing module available in the standard library.

Comments & Discussion

andomar on April 2, 2020

The groupby example only works because your list is already sorted by field.

See “Generally, the iterable needs to already be sorted on the same key function.” docs.python.org/3.5/library/itertools.html#itertools.groupby

Chris James on April 20, 2020

It took me a little head scratching to figure out how to make the groupby version just display the names and not the whole Scientist object. This is what I came up with:

import itertools
scientists_by_field = {
    item[0]: list(x.name for x in item[1])
    for item in itertools.groupby(scientists, lambda x: x.field)
}
scientists_by_field

Because groupby returns a ‘grouper’ iterator, you can also make a dictionary of tuples like so

import itertools
scientists_by_field = {
    item[0]: tuple(x.name for x in item[1])
    for item in itertools.groupby(scientists, lambda x: x.field)
}
scientists_by_field

Igor Conrado Alves de Lima on April 26, 2020

The usage of itertools.groupby in the video is actually not correct. As @andomar pointed out, in order to use itertools.groupby the iterable should already be sorted. That’s why we don’t see Marie Curie in the physics group.

Here is the appropriate code:

import itertools

scientists_sorted_by_field = sorted(scientists, key=lambda x: x.field)
scientists_by_field = {
    item[0]: tuple(item[1])
    for item in itertools.groupby(scientists_sorted_by_field,
        lambda x: x.field)
}
scientists_by_field

This will produce the following output:

{'astronomy': (Scientist(name='Vera Rubin', field='astronomy', born=1928, nobel=False),),
 'chemistry': (Scientist(name='Tu Youyou', field='chemistry', born=1930, nobel=True),
  Scientist(name='Ada Yonath', field='chemistry', born=1939, nobel=True)),
 'math': (Scientist(name='Ada Lovelace', field='math', born=1815, nobel=False),
  Scientist(name='Emy Noether', field='math', born=1882, nobel=False)),
 'physics': (Scientist(name='Marie Curie', field='physics', born=1867, nobel=True),
  Scientist(name='Sally Ride', field='physics', born=1951, nobel=False))}

Hope it helps.

Dan Bader RP Team on April 27, 2020

Fantastic, thank you for the clarification andomar & Igor! Really appreciate it.

Become a Member to join the conversation.