Missing values can derail your analysis. In pandas, you can use the .dropna()
method to remove rows or columns containing null values—in other words, missing data—so you can work with clean DataFrames. In this tutorial, you’ll learn how this method’s parameters let you control exactly which data gets removed. As you’ll see, these parameters give you fine-grained control over how much of your data to clean.
Dealing with null values is essential for keeping datasets clean and avoiding the issues they can cause. Missing entries can lead to misinterpreted column data types, inaccurate conclusions, and errors in calculations. Simply put, nulls can cause havoc if they find their way into your calculations.
By the end of this tutorial, you’ll understand that:
- You can use
.dropna()
to remove rows and columns from a pandas DataFrame. - You can remove rows and columns based on the content of a subset of your DataFrame.
- You can remove rows and columns based on the volume of null values within your DataFrame.
To get the most out of this tutorial, it’s recommended that you already have a basic understanding of how to create pandas DataFrames from files.
You’ll use the Python REPL along with a file named sales_data_with_missing_values.csv
, which contains several null values you’ll deal with during the exercises. Before you start, extract this file from the downloadable materials by clicking the link at the end of this section.
The sales_data_with_missing_values.csv
file is based on the publicly available and complete sales data file from Kaggle. Understanding the file’s content isn’t essential for this tutorial, but you can explore the Kaggle link above for more details if you’d like.
You’ll also need to install both the pandas and PyArrow libraries to make sure all code examples work in your environment:
It’s time to refine your pandas skills by learning how to handle missing data in a variety of ways.
You’ll find all code examples and the sales_data_with_missing_values.csv
file in the materials for this tutorial, which you can download by clicking the link below:
Get Your Code: Click here to download the free sample code that you’ll use to learn how to drop null values in pandas.
Take the Quiz: Test your knowledge with our interactive “How to Drop Null Values in pandas” quiz. You’ll receive a score upon completion to help you track your learning progress:
Interactive Quiz
How to Drop Null Values in pandasQuiz yourself on pandas .dropna(): remove nulls, clean missing data, and prepare DataFrames for accurate analysis.
How to Drop Rows Containing Null Values in pandas
Before you start dropping rows, it’s helpful to know what options .dropna()
gives you. This method supports six parameters that let you control exactly what’s removed:
axis
: Specifies whether to remove rows or columns containing null values.thresh
andhow
: Define how many missing values to remove or retain.subset
: Limits the removal of null values to specific parts of your DataFrame.inplace
: Determines whether the operation modifies the original DataFrame or returns a new copy.ignore_index
: Resets the DataFrame index after removing rows.
Don’t worry if any of these parameters don’t make sense to you just yet—you’ll learn why each is used during this tutorial. You’ll also get the chance to practice your skills.
Note: Although this tutorial teaches you how pandas DataFrames use .dropna()
, DataFrames aren’t the only pandas objects that use it.
Series objects also have their own .dropna()
method. However, the Series version contains only four parameters—axis
, inplace
, how
, and ignore_index
—instead of the six supported by the DataFrame version. Of these, only inplace
and ignore_index
are used, and they work the same way as in the DataFrame method. The rest are kept for compatibility with DataFrame, but have no effect.
Indexes also have a .dropna()
method for removing missing index values, and it contains just one parameter: how
.
Before using .dropna()
to drop rows, you should first find out whether your data contains any null values:
>>> import pandas as pd
>>> pd.set_option("display.max_columns", None)
>>> sales_data = pd.read_csv(
... "sales_data_with_missing_values.csv",
... parse_dates=["order_date"],
... date_format="%d/%m/%Y",
... ).convert_dtypes(dtype_backend="pyarrow")
>>> sales_data
order_number order_date customer_name \
0 <NA> 2025-02-09 00:00:00 Skipton Fealty
1 70041 <NA> Carmine Priestnall
2 70042 2025-02-09 00:00:00 <NA>
3 70043 2025-02-10 00:00:00 Lanni D'Ambrogi
4 70044 2025-02-10 00:00:00 Tann Angear
5 70045 2025-02-10 00:00:00 Skipton Fealty
6 70046 2025-02-11 00:00:00 Far Pow
7 70047 2025-02-11 00:00:00 Hill Group
8 70048 2025-02-11 00:00:00 Devlin Nock
9 <NA> <NA> <NA>
10 70049 2025-02-12 00:00:00 Swift Inc
product_purchased discount sale_price
0 Chili Extra Virgin Olive Oil True 135.0
1 <NA> <NA> 150.0
2 Rosemary Olive Oil Candle False 78.0
3 <NA> True 19.5
4 Vanilla and Olive Oil Candle <NA> 13.98
5 Basil Extra Virgin Olive Oil True <NA>
6 Chili Extra Virgin Olive Oil False 150.0
7 Chili Extra Virgin Olive Oil True 135.0
8 Lavender and Olive Oil Lotion False 39.96
9 <NA> <NA> <NA>
10 Garlic Extra Virgin Olive Oil True 936.0
To make sure all columns appear on your screen, you configure pd.set_option("display.max_columns", None)
. By passing None
as the second parameter, you make sure all columns are displayed.
You read the sales_data_with_missing_values.csv
file into a DataFrame using the pandas read_csv()
function, then view the data. The order dates are in the "%d/%m/%Y"
format in the file, so to make sure the order_date
data is read correctly, you use both the parse_dates
and date_format
parameters. The output reveals there are ten rows and six columns of data in your file.