Sorting Your DataFrame on Multiple Columns
00:00 Sorting Your DataFrame on Multiple Columns. In data analysis, it’s common to want to sort your data based on the values of multiple columns. Imagine you have a dataset with people’s first and last names.
00:16 It would make sense to sort by last name and then first name so that people with the same last name are arranged alphabetically according to their first names.
00:26
In the first example, you sorted your DataFrame on a single column named city08
. From an analysis standpoint, the MPG in city conditions is an important factor that could determine a car’s desirability. In addition to the MPG in city conditions, you may want to look at it for highway conditions.
00:45
To sort by two keys, you can pass a list of column names to by
.
00:58 Note that the second list passed in this example instructs pandas to only show the columns we’ve listed. This makes it easier to understand what’s happening with this two-level sort.
01:10
By specifying a list of the column names city08
and highway08
, you sort the DataFrame on two columns using .sort_values()
.
01:19
The next example will explain how to specify the sort order and why it’s important to pay attention to the list of column names you use. To sort the DataFrame on multiple columns, you must provide a list of column names. For example, to sort by make and model, you should create the following list and then pass it to .sort_values()
.
01:46 Now your DataFrame is sorted in ascending order by make. If there are two or more identical makes, then it’s sorted by model, The order in which the column names are specified in your list corresponds to how your DataFrame will be sorted.
02:02
If you want to change the logical sort order from the previous example, then you can change the order of the column names in the list that you pass to the by
parameter.
02:20
The DataFrame is now sorted by the model
column in ascending order, then sorted by make
if there are two or more of the same model.
02:27 You can see that changing the order of columns also changes the order in which the values get sorted. Up to this point, you’ve sorted only in ascending order on multiple columns.
02:40
In the next example, you’ll sort in descending order based on the make
and model
columns. To sort in descending order, set ascending
to False
.
02:59
The values in the make
column are in reverse alphabetical order, and the values in the model
column are in descending order for any cars with the same make.
03:10
With textual data, the sort is case sensitive, meaning that capitalized text will appear first in ascending order and last in descending order. You might be wondering if it’s possible to sort using multiple columns and to have those columns use different ascending
arguments. With pandas, you can do this with a single method call.
03:32
If you want to sort some columns in ascending order and some columns in descending order, then you can pass a list of Booleans to ascending
.
03:41
In this example, you sort your data frame by the make
, model
, and city08
columns, with the first two columns sorted in ascending order and city08
sorted in descending order.
04:06
Now your DataFrame is sorted by make
and model
in ascending order, but with the city08
column in descending order.
04:13 This is helpful because it groups the cars in a categorical order and shows the highest-MPG cars first. Now that you know how to sort a DataFrame on multiple columns, in the next section of the course, you’ll see how to sort one on its index.
Become a Member to join the conversation.