Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Locked learning resources

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Working With Missing Data When Sorting in pandas

For more information on concepts covered in this lesson, you can check out Using pandas to Make a Gradebook in Python.

00:00 Working With Missing Data When Sorting in Pandas. Real-world data often has many imperfections. While pandas has several methods you can use to clean your data before sorting, sometimes it’s nice to see which data is missing while you’re sorting.

00:18 You can do that with the na_position parameter. The subset of the fuel economy data used for this course doesn’t have missing values. To illustrate the use of na_position, first you’ll need to create some missing data. On-screen, you’ll see code that creates a new column based on the existing mpgData column, maping True where mpgData equals Y and NaN where it doesn’t.

00:55 Now you have a new column named mpgData_ that contains both True and NaN values. You’ll use this column to see what effect na_position has when you use the two sort methods. To find out more about using .map(), check out this Real Python course.

01:15 .sort_values() accepts a parameter named na_position, which helps to organize missing data in the column you’re sorting on. If you sort on a column with missing data, then the rows with the missing values will appear at the end of your DataFrame.

01:29 This happens regardless of whether you’re sorting in ascending or descending order. Here’s what your DataFrame looks like when you sort on the column with missing data.

01:44 To change this behavior and have the missing data appear first in the DataFrame, you can set na_position to first.

01:58 Now, any missing data from the columns you use to sort on will be shown at the top of your DataFrame. The na_position parameter only accepts the value last, which is the default, and first.

02:10 This is most helpful when you’re first starting to analyze your data and are unsure if there are any missing values.

02:18 .sort_index() also accepts na_position. Your DataFrame typically won’t have NaN values as a part of its index, so this parameter is less useful in .sort_index(). However, it’s good to know that if your DataFrame does have NaN in either the row index or a column name, then you can quickly identify this using .sort_index() and na_position.

02:42 By default, this parameter is set to last, which places NaN values at the end of the sorted result. To change that behavior and have the missing data first in your DataFrame, set na_position to first.

02:56 In the next section of the course, you’ll see how you can use sort methods to modify DataFrames.

Become a Member to join the conversation.