Episode 16: Thinking in Pandas: Python Data Analysis the Right Way
The Real Python Podcast
Jul 03, 2020 1h 2m
Are you using the Python library Pandas the right way? Do you wonder about getting better performance, or how to optimize your data for analysis? What does normalization mean? This week on the show we have Hannah Stepanek to discuss her new book “Thinking in Pandas”.
The inspiration behind Hannah’s book came out of her talk at PyCon US 2019 titled “Thinking Like a Panda: Everything You Need to Know to Use Pandas the Right Way.” We discuss several core concepts covered in the book. She shares techniques for getting more performance when working with your data in Pandas. We also talk about her recent PyCon US 2020 online presentation about databases and migration.
Course Spotlight: Finding the Perfect Python Code Editor
Find your perfect Python development setup with this review of Python IDEs and code editors. With this course you’ll get an overview of the most common Python coding environments to help you make an informed decision.
Topics:
- 00:00:00 – Introduction
- 00:01:36 – Working for New Relic
- 00:03:14 – Thinking in Pandas book release
- 00:03:27 – Who is the intended reader?
- 00:05:27 – What is the underlying tech for Pandas?
- 00:09:04 – Why you shouldn’t use apply?
- 00:13:00 – When you have to use apply
- 00:16:06 – Normalizing your data
- 00:17:05 – Do you have a preferred format for a dataframe?
- 00:18:17 – More on multi-index dataframes
- 00:24:50 – Creating NumPy types
- 00:28:30 – Loading in your data
- 00:30:33 – Video Course Spotlight
- 00:31:41 – Pivoting data
- 00:34:34 – Considering outside libraries and performance
- 00:35:41 – What topic were you eager to share in the book?
- 00:37:52 – What resources did you use to learn pandas?
- 00:40:53 – PyCon 2020 talk about databases and migration
- 00:45:34 – Delving into migration and Alembic
- 00:53:15 – Speaking opportunities
- 00:56:13 – What are you excited about in the world of Python?
- 00:57:32 – What do you want to learn next?
- 00:58:49 – Do you read source code to learn?
- 01:00:16 – Is there a particularly well-written library?
- 01:01:28 – Final Thanks
Links:
- Thinking in Pandas: How to Use the Python Data Analysis Library the Right Way - Apress
- Thinking like a Panda: Everything you need to know to use pandas the right way - PyCon 2019 - Hannah Stepanek
- pandas
- CPython Internals: Your Guide to the Python 3 Interpreter
- MultiIndex / advanced indexing: pandas documentation
- NumPy Data type objects (dtype)
- pandas.DataFrame.pivot: pandas documentation
- Let’s talk Databases in Python: SQLAlchemy and Alembic - PyCon 2020 - Hannah Stepanek
- SQLAlchemy: The Python SQL Toolkit and Object Relational Mapper
- Alembic: A database migration tool for SQLAlchemy
- import asyncio: Learn Python’s AsyncIO #1 - The Async Ecosystem