Data Cleaning With pandas and NumPy

Data scientists spend a large amount of their time cleaning datasets so that they’re easier to work with. In fact, the 80/20 rule says that the initial steps of obtaining and cleaning data account for 80% of the time spent on any given project.

So, if you’re just stepping into this field or planning to step into this field, it’s important to be able to deal with messy data, whether that means missing values, inconsistent formatting, malformed records, or nonsensical outliers.

In this video course, you’ll leverage Python’s pandas and NumPy libraries to clean data.

Along the way, you’ll learn about:

Dropping unnecessary columns in a DataFrame
Changing the index of a DataFrame
Using .str() methods to clean columns
Renaming columns to a more recognizable set of labels
Skipping unnecessary rows in a CSV file

To get the most out of this tutorial, you should have a basic understanding of the pandas and NumPy libraries, including pandas’ workhorse Series and DataFrame objects, common methods that can be applied to these objects, and NumPy’s NaN values.

What’s Included:

16 Lessons
Video Subtitles and Full Transcripts
2 Downloadable Resources
Accompanying Text-Based Tutorial
Q&A With Python Experts: Ask a Question
Certificate of Completion

Downloadable Resources:

Related Learning Paths:

About Ian Currie

Ian is a Python nerd who relies on it for work and much enjoyment.

» More about Ian

Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are: