For more information on concepts covered in this lesson, you can check out:
Sorting Your DataFrame on Its Index
Sorting Your DataFrame on Its Index. Before sorting on the index, it’s a good idea to know what an index represents. A DataFrame has an
index property, which by default is a numerical representation of its rows’ locations.
You can think of the index as the row numbers. It helps in quick row lookup and identification. You can sort a DataFrame based on its row index with
00:30 Sorting by column values like you did in a previous example reorders the rows in your DataFrame, so the index becomes disorganized. This can also happen when you filter a DataFrame or when you drop or add rows.
To illustrate the use of
.sort_index(), start by creating a new sorted DataFrame using
You’ve created a DataFrame that’s sorted using multiple values. Notice how the row index is in no particular order. To get your DataFrame back to the original order, you can use
Now the index is in ascending order. Just like
.sort_values(), the default argument for
True, and you can change to descending order by passing
Sorting on the index has no impact on the data itself as the values are unchanged. This is particularly useful when you’ve assigned a custom index with
If you want to set a custom index using the
model columns, then you can pass a list to
Using this method, you replace the default integer base row index with two axis labels. This is considered a multi-index or hierarchical index. Your DataFrame is now indexed by more than one key, which you can sort on with
First, you assign a new index to your DataFrame using the
model columns. Then you sort the index using
You can read more on using
.set_index() in the pandas documentation.
For the next example, you’ll sort your DataFrame by its index in descending order. Remember from sorting your DataFrame with
.sort_values() that you can reverse the sort order by setting
The parameter works with
.sort_index(), so you can sort your DataFrame in reverse order like this.
Now your DataFrame is sorted by its index in descending order. One difference between using
.sort_values() is that
.sort_index() has no
by parameter since it sorts a DataFrame on the row index by default.
There are many cases in data analysis when you want to sort on a hierarchical index. You’ve already seen how you can use
model in a multi-index. For this dataset, you could also use the
id column as an index.
id column as the index could be helpful in linking related datasets. For example, the EPA’s
emissions dataset also uses
id to represent vehicle record IDs. This links the emissions data to the fuel economy data.
Sorting the index of both datasets and DataFrames could speed up using other methods, such as
.merge(). To learn more about combining data in pandas, check out this Real Python article.
04:06 In the next section of the course, you’ll see how to sort a DataFrame’s columns.
Become a Member to join the conversation.