Sorting Your DataFrame on a Single Column
For more information on concepts covered in this lesson, you can check out Introduction to Sorting Algorithms in Python.
00:00
Sorting Your DataFrame on a Single Column. To sort the DataFrame based on the values in a single column, you’ll use .sort_values()
. By default, this will return a new DataFrame sorted in ascending order.
00:14
It doesn’t modify the original DataFrame. To use .sort_values()
, you pass a single argument to the method containing the name of the column you want to sort by. In this example, you sort the DataFrame by the city08
column, which represents city miles per gallon for fuel-only cars.
00:39
This sorts your DataFrame using the column values from city08
, showing the vehicles with the lowest miles per gallon first. By default, .sort_values()
sorts your data in ascending order.
00:53
Although you didn’t specify a name for the argument you passed to .sort_values()
, you actually used the by
parameter, which you’ll see in the next example.
01:03
Another parameter of .sort_values()
is ascending
, which by default is set to True
. If you want the DataFrame sorted in descending order, then you can pass False
to this parameter, as seen on-screen.
01:24
By passing False
to ascending
, you reverse the sort order. Now your DataFrame is sorted in descending order by the average miles per gallon measured in city conditions.
01:34 The vehicles with the highest miles-per-gallon values are in the first rows.
01:40
It’s good to note that pandas allows you to choose different sorting algorithms to use with both .sort_values()
and .sort_index()
.
01:47
The available algorithms are quicksort
, mergesort
, and heapsort
. For more information on these different sorting algorithms, check out this Real Python course.
02:01
The algorithm used by default when sorting on a single column is quicksort
. To change this to a stable sorting algorithm, use mergesort
.
02:11
You can do that with the kind
parameter in .sort_values()
or .sort_index()
, as seen on-screen.
02:26
Using kind
, you set the sorting algorithm to mergesort
. The previous output used the default quicksort
algorithm.
02:35
Looking at the highlighted indices, you can see the rows are in a different order. This is because quicksort
is not a stable sorting algorithm, but mergesort
is. Note that in pandas, kind
is ignored when you sort on more than one column or label.
02:52 When you’re sorting multiple records that have the same key, a stable sorting algorithm will maintain the original order of those records after sorting. For that reason, using a stable sorting algorithm is necessary if you plan to perform multiple sorts.
03:08 Now that you’re familiar with sorting a DataFrame on a single column, you’re ready to see how to sort one on multiple columns. And that’s what will be covered in the next section of the course.
Become a Member to join the conversation.