Writing DataFrame-Agnostic Python Code With Narwhals

Writing DataFrame-Agnostic Python Code With Narwhals

Narwhals is intended for Python library developers who need to analyze DataFrames in a range of standard formats, including Polars, pandas, DuckDB, and others. It does this by providing a compatibility layer of code that handles any differences between the various formats.

In this tutorial, you’ll learn how to use the same Narwhals code to analyze data produced by the latest versions of two very common data libraries. You’ll also discover how Narwhals utilizes the efficiencies of your source data’s underlying library when analyzing your data. Furthermore, because Narwhals uses syntax that is a subset of Polars, you can reuse your existing Polars knowledge to quickly gain proficiency with Narwhals.

The table below will allow you to quickly decide whether or not Narwhals is for you:

Use Case Use Narwhals Use Another Tool
You need to produce DataFrame-agnostic code.
You want to learn a new DataFrame library.

Whether you’re wondering how to develop a Python library to cope with DataFrames from a range of common formats, or just curious to find out if this is even possible, this tutorial is for you. The Narwhals library could provide exactly what you’re looking for.

Take the Quiz: Test your knowledge with our interactive “Writing DataFrame-Agnostic Python Code With Narwhals” quiz. You’ll receive a score upon completion to help you track your learning progress:


Interactive Quiz

Writing DataFrame-Agnostic Python Code With Narwhals

If you're a Python library developer wondering how to write DataFrame-agnostic code, the Narwhals library is the solution you're looking for.

Get Ready to Explore Narwhals

Before you start, you’ll need to install Narwhals and have some data to play around with. You should also be familiar with the idea of a DataFrame. Although having an understanding of several DataFrame libraries isn’t mandatory, you’ll find a familiarity with Polars’ expressions and contexts syntax extremely useful. This is because Narwhals’ syntax is based on a subset of Polars’ syntax. However, Narwhals doesn’t replace Polars.

In this example, you’ll use data stored in the presidents Parquet file included in your downloadable materials.

This file contains the following six fields to describe United States presidents:

Heading Meaning
last_name The president’s last name
first_name The president’s first name
term_start Start of the presidential term
term_end End of the presidential term
party_name The president’s political party
century Century the president’s term started

To work through this tutorial, you’ll need to install the pandas, Polars, PyArrow, and Narwhals libraries:

Shell
$ python -m pip install pandas polars pyarrow narwhals

A key feature of Narwhals is that it’s DataFrame-agnostic, meaning your code can work with several formats. But you still need both Polars and pandas because Narwhals will use them to process the data you pass to it. You’ll also need them to create your DataFrames to pass to Narwhals to begin with.

You installed the PyArrow library to correctly read the Parquet files. Finally, you installed Narwhals itself.

With everything installed, make sure you create the project’s folder and place your downloaded presidents.parquet file inside it. You might also like to add both the books.parquet and authors.parquet files as well. You’ll need them later.

With that lot done, you’re good to go!

Understand How Narwhals Works

The documentation describes Narwhals as follows:

Extremely lightweight and extensible compatibility layer between dataframe libraries! (Source)

Narwhals is lightweight because it wraps the original DataFrame in its own object ecosystem while still using the source DataFrame’s library to process it. Any data passed into it for processing doesn’t need to be duplicated, removing an otherwise resource-intensive and time-consuming operation.

Narwhals is also extensible. For example, you can write Narwhals code to work with the full API of the following libraries:

It also supports the lazy API of the following:

Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Article

Already a member? Sign-In

Locked learning resources

The full article is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Article

Already a member? Sign-In

About Ian Eyre

Ian is an avid Pythonista and Real Python contributor who loves to learn and teach others.

» More about Ian

Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are:

What Do You Think?

What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment below and let us know.

Commenting Tips: The most useful comments are those written with the goal of learning from or helping out other students. Get tips for asking good questions and get answers to common questions in our support portal.


Looking for a real-time conversation? Visit the Real Python Community Chat or join the next “Office Hours” Live Q&A Session. Happy Pythoning!

Become a Member to join the conversation.

Keep Learning

Related Topics: advanced data-science python