Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Locked learning resources

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Working With Python Polars (Summary)

Polars is a lightning-fast and rapidly growing DataFrame library. Polars’ optimized back end, familiar yet efficient syntax, lazy API, and integration with the Python ecosystem make the library stand out among the crowd. You’ve now gotten a broad overview of Polars, and you have the knowledge and resources necessary to get started using Polars in your own projects.

In this tutorial, you’ve learned:

  • Why Polars is so performant and attention-grabbing
  • How to work with DataFrames, expressions, and contexts
  • How to read data into DataFrames
  • How to group and aggregate data
  • What the lazy API is and how to build lazy queries

Here are additional resources mentioned in the course:

Download

Course Slides (.pdf)

1.1 MB
Download

Sample Code (.zip)

441.2 KB

00:00 In the previous lesson, I introduced you to the LazyFrame. This lesson is the summary of the course, along with some pointers to other sources of information.

00:09 Polars is a powerful structured data processing library. Within it, you use DataFrames to represent rows and columns similar to a spreadsheet. You operate on a DataFrame using expressions, which are a mini-language made from Python objects.

00:24 These expressions get evaluated on a context, which is a way of creating a subset of your data.

00:32 The most common contexts are: .with_columns() to add columns to a DataFrame, .select() to choose a subset of columns, .filter() to choose a subset of rows, and the .group_by() method that creates subgroups, which get operated upon using the .agg() method that does aggregate calculations like sum, average, and count. Polars is capable of reading data out of a variety of common file formats using the family of read calls.

01:01 And my favorite feature has to be lazy evaluation, which allows you to create chained expressions, which then get combined with the reading of data from a file using the family of scan calls.

01:12 This means a lot less has to be in memory at any given time and tends to mean faster evaluation speeds.

01:20 I’ve only scratched the surface of Polars, a dangerous thing. Don’t poke the bear. The library also has joins, pivots, time series data, and cloud computing integration.

01:31 For more information, the online docs are quite well written. There’s both a user guide and a formal API reference.

01:42 For more on Polars from Real Python, the “How to Deal With Missing Data in Polars” tutorial might be of interest to you. Or, you could listen to The Real Python Podcast, episode 140, where they talk about Polars and just how fast it is. For other data science libraries, you might be interested in NumPy or pandas.

02:02 Real Python has many NumPy and pandas courses over and above the two I’ve shown on the screen here. Use the search box in the top right corner of the Real Python site to find loads more.

02:15 I really like Polars. I find it more intuitive than some of the other data libraries out there. I hope you found the course useful. Thanks for your attention.

Become a Member to join the conversation.