How to Reset a pandas DataFrame Index

How to Reset a pandas DataFrame Index

In this tutorial, you’ll learn how to reset a pandas DataFrame index, the reasons why you might want to do this, and the problems that could occur if you don’t.

Before you start your learning journey, you should familiarize yourself with how to create a pandas DataFrame. Knowing the difference between a DataFrame and a pandas Series will also prove useful to you.

In addition, you may want to use the data analysis tool Jupyter Notebook as you work through the examples in this tutorial. Alternatively, JupyterLab will give you an enhanced notebook experience, but feel free to use any Python environment you wish.

As a starting point, you’ll need some data. To begin with, you’ll use the band_members.csv file included in the downloadable materials that you can access by clicking the link below:

The table below describes the data from band_members.csv that you’ll begin with:

Column Name PyArrow Data Type Description
first_name string First name of member
last_name string Last name of member
instrument string Main instrument played
date_of_birth string Member’s date of birth

As you’ll see, the data has details of the members of the rock band Beach Boys. Each row contains information about its various members both past and present.

Throughout this tutorial, you’ll be using the pandas library to allow you to work with DataFrames, as well as the newer PyArrow library. The PyArrow library provides pandas with its own optimized data types, which are faster and less memory-intensive than the traditional NumPy types that pandas uses by default.

If you’re working at the command line, you can install both pandas and pyarrow using the single command python -m pip install pandas pyarrow. If you’re working in a Jupyter Notebook, you should use !python -m pip install pandas pyarrow. Regardless, you should do this within a virtual environment to avoid clashes with the libraries you use in your global environment.

Once you have the libraries in place, it’s time to read your data into a DataFrame:

Python
>>> import pandas as pd

>>> beach_boys = pd.read_csv(
...     "band_members.csv"
... ).convert_dtypes(dtype_backend="pyarrow")

First, you used import pandas to make the library available within your code. To construct the DataFrame and read it into the beach_boys variable, you used pandas’ read_csv() function, passing band_members.csv as the file to read. Finally, by passing dtype_backend="pyarrow" to .convert_dtypes() you convert all columns to pyarrow types.

If you want to verify that pyarrow data types are indeed being used, then beach_boys.dtypes will satisfy your curiosity:

Python
>>> beach_boys.dtypes
first_name            string[pyarrow]
last_name             string[pyarrow]
instrument            string[pyarrow]
date_of_birth         string[pyarrow]
dtype: object

As you can see, each data type contains [pyarrow] in its name.

If you wanted to analyze the date information thoroughly, then you would parse the date_of_birth column to make sure dates are read as a suitable pyarrow date type. This would allow you to analyze by specific days, months or years, and so on, as commonly found in pivot tables.

The date_of_birth column is not analyzed in this tutorial, so the string data type it’s being read as will do. Later on, you’ll get the chance to hone your skills with some exercises. The solutions include the date parsing code if you want to see how it’s done.

Now that the file has been loaded into a DataFrame, you’ll probably want to take a look at it:

Python
>>> beach_boys
  first_name last_name instrument date_of_birth
0      Brian    Wilson       Bass   20-Jun-1942
1       Mike      Love  Saxophone   15-Mar-1941
2         Al   Jardine     Guitar   03-Sep-1942
3      Bruce  Johnston       Bass   27-Jun-1942
4       Carl    Wilson     Guitar   21-Dec-1946
5     Dennis    Wilson      Drums   04-Dec-1944
6      David     Marks     Guitar   22-Aug-1948
7      Ricky    Fataar      Drums   05-Sep-1952
8    Blondie   Chaplin     Guitar   07-Jul-1951

DataFrames are two-dimensional data structures similar to spreadsheets or database tables. A pandas DataFrame can be considered a set of columns, with each column being a pandas Series. Each column also has a heading, which is the name property of the Series, and each row has a label, which is referred to as an element of its associated index object.

The DataFrame’s index is shown to the left of the DataFrame. It’s not part of the original band_members.csv source file, but is added as part of the DataFrame creation process. It’s this index object you’re learning to reset.

The index of a DataFrame is an additional column of labels that helps you identify rows. When used in combination with column headings, it allows you to access specific data within your DataFrame. The default index labels are a sequence of integers, but you can use strings to make them more meaningful. You can actually use any hashable type for your index, but integers, strings, and timestamps are the most common.

Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Article

Already a member? Sign-In

Locked learning resources

The full article is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Article

Already a member? Sign-In

About Ian Eyre

Ian is an avid Pythonista and Real Python contributor who loves to learn and teach others.

» More about Ian

Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are:

What Do You Think?

What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment below and let us know.

Commenting Tips: The most useful comments are those written with the goal of learning from or helping out other students. Get tips for asking good questions and get answers to common questions in our support portal.


Looking for a real-time conversation? Visit the Real Python Community Chat or join the next “Office Hours” Live Q&A Session. Happy Pythoning!

Become a Member to join the conversation.

Keep Learning