Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Sorting Your DataFrame on Multiple Columns

00:00 Sorting Your DataFrame on Multiple Columns. In data analysis, it’s common to want to sort your data based on the values of multiple columns. Imagine you have a dataset with people’s first and last names.

00:16 It would make sense to sort by last name and then first name so that people with the same last name are arranged alphabetically according to their first names.

00:26 In the first example, you sorted your DataFrame on a single column named city08. From an analysis standpoint, the MPG in city conditions is an important factor that could determine a car’s desirability. In addition to the MPG in city conditions, you may want to look at it for highway conditions.

00:45 To sort by two keys, you can pass a list of column names to by.

00:58 Note that the second list passed in this example instructs pandas to only show the columns we’ve listed. This makes it easier to understand what’s happening with this two-level sort.

01:10 By specifying a list of the column names city08 and highway08, you sort the DataFrame on two columns using .sort_values().

01:19 The next example will explain how to specify the sort order and why it’s important to pay attention to the list of column names you use. To sort the DataFrame on multiple columns, you must provide a list of column names. For example, to sort by make and model, you should create the following list and then pass it to .sort_values().

01:46 Now your DataFrame is sorted in ascending order by make. If there are two or more identical makes, then it’s sorted by model, The order in which the column names are specified in your list corresponds to how your DataFrame will be sorted.

02:02 If you want to change the logical sort order from the previous example, then you can change the order of the column names in the list that you pass to the by parameter.

02:20 The DataFrame is now sorted by the model column in ascending order, then sorted by make if there are two or more of the same model.

02:27 You can see that changing the order of columns also changes the order in which the values get sorted. Up to this point, you’ve sorted only in ascending order on multiple columns.

02:40 In the next example, you’ll sort in descending order based on the make and model columns. To sort in descending order, set ascending to False.

02:59 The values in the make column are in reverse alphabetical order, and the values in the model column are in descending order for any cars with the same make.

03:10 With textual data, the sort is case sensitive, meaning that capitalized text will appear first in ascending order and last in descending order. You might be wondering if it’s possible to sort using multiple columns and to have those columns use different ascending arguments. With pandas, you can do this with a single method call.

03:32 If you want to sort some columns in ascending order and some columns in descending order, then you can pass a list of Booleans to ascending.

03:41 In this example, you sort your data frame by the make, model, and city08 columns, with the first two columns sorted in ascending order and city08 sorted in descending order.

04:06 Now your DataFrame is sorted by make and model in ascending order, but with the city08 column in descending order.

04:13 This is helpful because it groups the cars in a categorical order and shows the highest-MPG cars first. Now that you know how to sort a DataFrame on multiple columns, in the next section of the course, you’ll see how to sort one on its index.

Become a Member to join the conversation.