Data Cleaning With pandas and NumPy (Summary)
Congrulations! Now you know how to clean data using pandas and NumPy. Cleaning data can be a major undertaking, but it’s vital to any data science project. You’ve practiced the necessary skills on three different datasets, all while bulding a reusable data cleaning script.
In this video course, you learned how to:
- Drop unnecessary columns in a
DataFrame
- Change the index of a
DataFrame
- Use
.str()
methods to clean columns - Rename columns to a more recognizable set of labels
- Skip unnecessary rows in a CSV file
Check out the links below to find additional resources that’ll help you on your Python data science journey:
- The Pandas documentation
- The NumPy documentation
- Python for Data Analysis by Wes McKinney, the creator of Pandas
- Pandas Cookbook by Ted Petrou, a data science trainer and consultant
Congratulations, you made it to the end of the course! What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment in the discussion section and let us know.
00:00
Congratulations. You’ve reached the end of the course. In this lesson, you’re just going to recap everything you’ve done so far. Well done. You now have a good framework for how to approach and structure your data cleaning when you’re using pandas. In this course, you’ve followed along with the cleaning of three different datasets. To get there, you first set up your working environment using VS Code, got to grips with the basic usage of pandas, and learned how to structure a reusable data cleaning script. Along the way, you’ve revised some essential techniques, like exploring your data effectively with .loc[]
.
00:40
You’ve also learned how to use .loc[]
in conjunction with powerful methods like the .assign()
method. Additionally, you’ve learned how to rename columns, drop columns, how to identify and reassign indices, and work with data types.
00:57 Now you’re ready to start going on your own data cleaning adventures. Remember to always keep the pandas documentation close to hand. You can also take another look at the notes posted below each video and check out the complete source code for all the examples that were covered in this course. Finally, on realpython.com, there are many more tutorials and video courses on pandas and data science in general. Again, well done for getting to the end of this course.
Become a Member to join the conversation.