Sorting Your DataFrame on Its Index
For more information on concepts covered in this lesson, you can check out:
00:00
Sorting Your DataFrame on Its Index. Before sorting on the index, it’s a good idea to know what an index represents. A DataFrame has an index
property, which by default is a numerical representation of its rows’ locations.
00:16
You can think of the index as the row numbers. It helps in quick row lookup and identification. You can sort a DataFrame based on its row index with .sort_index()
.
00:30 Sorting by column values like you did in a previous example reorders the rows in your DataFrame, so the index becomes disorganized. This can also happen when you filter a DataFrame or when you drop or add rows.
00:45
To illustrate the use of .sort_index()
, start by creating a new sorted DataFrame using .sort_values()
.
01:00
You’ve created a DataFrame that’s sorted using multiple values. Notice how the row index is in no particular order. To get your DataFrame back to the original order, you can use .sort_index()
.
01:17
Now the index is in ascending order. Just like .sort_values()
, the default argument for ascending
in .sort_index()
is True
, and you can change to descending order by passing False
.
01:31
Sorting on the index has no impact on the data itself as the values are unchanged. This is particularly useful when you’ve assigned a custom index with .set_index()
.
01:42
If you want to set a custom index using the make
and model
columns, then you can pass a list to .set_index()
.
01:59
Using this method, you replace the default integer base row index with two axis labels. This is considered a multi-index or hierarchical index. Your DataFrame is now indexed by more than one key, which you can sort on with .sort_index()
.
02:19
First, you assign a new index to your DataFrame using the make
and model
columns. Then you sort the index using .sort_index()
.
02:28
You can read more on using .set_index()
in the pandas documentation.
02:35
For the next example, you’ll sort your DataFrame by its index in descending order. Remember from sorting your DataFrame with .sort_values()
that you can reverse the sort order by setting ascending
to False
.
02:48
The parameter works with .sort_index()
, so you can sort your DataFrame in reverse order like this.
03:00
Now your DataFrame is sorted by its index in descending order. One difference between using .sort_index()
and .sort_values()
is that .sort_index()
has no by
parameter since it sorts a DataFrame on the row index by default.
03:16
There are many cases in data analysis when you want to sort on a hierarchical index. You’ve already seen how you can use make
and model
in a multi-index. For this dataset, you could also use the id
column as an index.
03:33
Setting the id
column as the index could be helpful in linking related datasets. For example, the EPA’s emissions
dataset also uses id
to represent vehicle record IDs. This links the emissions data to the fuel economy data.
03:51
Sorting the index of both datasets and DataFrames could speed up using other methods, such as .merge()
. To learn more about combining data in pandas, check out this Real Python article.
04:06 In the next section of the course, you’ll see how to sort a DataFrame’s columns.
Become a Member to join the conversation.