Sorting Your DataFrame on Multiple Columns
00:00 Sorting Your DataFrame on Multiple Columns. In data analysis, it’s common to want to sort your data based on the values of multiple columns. Imagine you have a dataset with people’s first and last names.
00:16 It would make sense to sort by last name and then first name so that people with the same last name are arranged alphabetically according to their first names.
In the first example, you sorted your DataFrame on a single column named
city08. From an analysis standpoint, the MPG in city conditions is an important factor that could determine a car’s desirability. In addition to the MPG in city conditions, you may want to look at it for highway conditions.
To sort by two keys, you can pass a list of column names to
00:58 Note that the second list passed in this example instructs pandas to only show the columns we’ve listed. This makes it easier to understand what’s happening with this two-level sort.
By specifying a list of the column names
highway08, you sort the DataFrame on two columns using
The next example will explain how to specify the sort order and why it’s important to pay attention to the list of column names you use. To sort the DataFrame on multiple columns, you must provide a list of column names. For example, to sort by make and model, you should create the following list and then pass it to
01:46 Now your DataFrame is sorted in ascending order by make. If there are two or more identical makes, then it’s sorted by model, The order in which the column names are specified in your list corresponds to how your DataFrame will be sorted.
If you want to change the logical sort order from the previous example, then you can change the order of the column names in the list that you pass to the
The DataFrame is now sorted by the
model column in ascending order, then sorted by
make if there are two or more of the same model.
02:27 You can see that changing the order of columns also changes the order in which the values get sorted. Up to this point, you’ve sorted only in ascending order on multiple columns.
In the next example, you’ll sort in descending order based on the
model columns. To sort in descending order, set
The values in the
make column are in reverse alphabetical order, and the values in the
model column are in descending order for any cars with the same make.
With textual data, the sort is case sensitive, meaning that capitalized text will appear first in ascending order and last in descending order. You might be wondering if it’s possible to sort using multiple columns and to have those columns use different
ascending arguments. With pandas, you can do this with a single method call.
If you want to sort some columns in ascending order and some columns in descending order, then you can pass a list of Booleans to
In this example, you sort your data frame by the
city08 columns, with the first two columns sorted in ascending order and
city08 sorted in descending order.
Now your DataFrame is sorted by
model in ascending order, but with the
city08 column in descending order.
04:13 This is helpful because it groups the cars in a categorical order and shows the highest-MPG cars first. Now that you know how to sort a DataFrame on multiple columns, in the next section of the course, you’ll see how to sort one on its index.
Become a Member to join the conversation.