For more information on concepts covered in this lesson, you can check out Using Pandas to Make a Gradebook in Python.
Working With Missing Data When Sorting in Pandas
00:00 Working With Missing Data When Sorting in Pandas. Real-world data often has many imperfections. While pandas has several methods you can use to clean your data before sorting, sometimes it’s nice to see which data is missing while you’re sorting.
You can do that with the
na_position parameter. The subset of the fuel economy data used for this course doesn’t have missing values. To illustrate the use of
na_position, first you’ll need to create some missing data. On-screen, you’ll see code that creates a new column based on the existing
mpgData column, maping
NaN where it doesn’t.
Now you have a new column named
mpgData_ that contains both
NaN values. You’ll use this column to see what effect
na_position has when you use the two sort methods. To find out more about using
.map(), check out this Real Python course.
.sort_values() accepts a parameter named
na_position, which helps to organize missing data in the column you’re sorting on. If you sort on a column with missing data, then the rows with the missing values will appear at the end of your DataFrame.
.sort_index() also accepts
na_position. Your DataFrame typically won’t have
NaN values as a part of its index, so this parameter is less useful in
.sort_index(). However, it’s good to know that if your DataFrame does have
NaN in either the row index or a column name, then you can quickly identify this using
By default, this parameter is set to
last, which places
NaN values at the end of the sorted result. To change that behavior and have the missing data first in your DataFrame, set
Become a Member to join the conversation.