Episode 193: Wes McKinney on Improving the Data Stack & Composable Systems

The Real Python Podcast

Feb 23, 2024 1h 9m data-science projects python

How do you avoid the bottlenecks of data processing systems? Is it possible to build tools that decouple storage and computation? This week on the show, creator of the pandas library Wes McKinney is here to discuss Apache Arrow, composable data systems, and community collaboration.

Episode Sponsor:

Wes briefly describes the humble beginnings of the pandas project in 2008 and moving the project to open source in 2011. Since then, he’s been thinking about improvements across the data processing ecosystem.

Wes collaborated with members of the broader data science community to build the in-memory analytics infrastructure of Apache Arrow. Arrow avoids the bottlenecks of repeated data serialization and format conversion. He shares examples of Arrow’s use across the spectrum in tools like Polars and DuckDB.

Wes advocates moving from vertically integrated tools toward composable data systems. We discuss his work on Ibis, a portable dataframe API for data manipulation and exploration in Python. Ibis supports multiple backends by decoupling the API from the execution engine.

This week’s episode is brought to you by Posit Connect.

Course Spotlight: Unleashing the Power of the Console With Rich

Rich is a powerful library for creating text-based user interfaces (TUIs) in Python. It enhances code readability by pretty-printing complex data structures and adds visual appeal with colored text, tables, animations, and more.

Topics:

00:00:00 – Introduction
00:02:26 – Dealing with limitations in early data science
00:04:53 – Making pandas open source
00:07:10 – Making changes to an existing platform
00:12:34 – Decoupling storage and computation
00:23:04 – Sponsor: Posit Connect
00:23:54 – Apache Arrow solving multiple issues
00:27:40 – DuckDB efficient analytic SQL database
00:30:24 – Polars dataframe library
00:31:04 – pandas 2.0 adding Arrow
00:35:56 – Video Course Spotlight
00:37:20 – Apache Software Foundation background
00:41:29 – Shifting from developer to organizer and collaborator
00:45:56 – Creating a portable query layer with Ibis
00:55:34 – Casualties of the language wars
00:57:57 – What’s your role at Posit?
01:01:23 – What are you excited about in the world of Python?
01:04:52 – What do you want to learn next?
01:06:21 – How can people follow your work online?
01:08:20 – Thanks and goodbye

Show Links:

Level Up Your Python Skills With These Courses:

The Pandas DataFrame: Make Working With Data Delightful

The pandas DataFrame: Working With Data Efficiently

Data Cleaning With pandas and NumPy

The Python Rich Package: Unleash the Power of Console Text

Unleashing the Power of the Console With Rich

← Previous All Episodes Next →