Loading video player…

pandas GroupBy: Grouping Real World Data in Python (Overview)

Whether you’ve just started working with pandas and want to master one of its core capabilities, or you’re looking to fill in some gaps in your understanding about .groupby(), this course will help you to break down and visualize a pandas GroupBy operation from start to finish.

This course is meant to complement the official pandas documentation and the pandas Cookbook, where you’ll see self-contained, bite-sized examples. Here, however, you’ll focus on three more involved walkthroughs that use real-world datasets.

In this course, you’ll cover:

  • How to use pandas GroupBy operations on real-world data
  • How the split-apply-combine chain of operations works
  • How to decompose the split-apply-combine chain into steps
  • How to categorize methods of a pandas GroupBy object based on their intent and result
Download

Course Slides (.pdf)

3.3 MB
Download

Sample Code (.zip)

28.3 MB

00:00 Welcome to pandas GroupBy: Grouping Real World Data in Python. My name is Christopher, and I will be your guide. This course is predominantly meant for someone familiar with pandas already, but does include refresher material.

00:15 The refresher reminds you how to create a data frame and manipulate the data within it.

00:21 Once you’ve got a data frame, you can run operations by grouping content together. You do this using something called the split-apply-combine pattern, and you use pandas’ built-in methods to operate on the groupings created through this pattern.

00:36 Or you can also provide a Python lambda as well. Finally, once you’ve seen several different ways of grouping on three different sets of real world data, I’ll wrap up showing you how your choice of approach can have a big impact on the speed of your code.

00:53 The code in this course was tested with Python 3.12 and pandas 2.2.2. None of the Python here uses features specific to Python 3.12, and so any currently supported version should be fine.

01:06 pandas, on the other hand, has changed a bit since version one, so be aware if you’re not running the same version I am.

01:14 pandas is a powerful tool for doing data processing in Python, and it has the advantage of being written in low-level code, meaning you can get much better performance than an equivalent pure Python program.

01:27 pandas sees the world in the terms of tables and columns. This is similar to the structure of an Excel sheet or a table in a database. There are all sorts of things you can do on the tabular data, including slicing parts of it and running calculations.

01:42 This course is about grouping parts of your data together and running calculations on the groups. You do this grouping in pandas with the groupby() method, which is conceptually similar to the feature in SQL with the same name.

01:57 Once you’ve grouped your data together, pandas provides a whole host of functions to run on the groups like counting, finding mins and maxes, filtering, and much more.

02:09 I’ll start out with a review of data frames and their operations in pandas.

Become a Member to join the conversation.