Splitting Datasets With scikit-learn and train_test_split()

One of the key aspects of supervised machine learning is model evaluation and validation. When you evaluate the predictive performance of your model, it’s essential that the process be unbiased. Using train_test_split() from the data science library scikit-learn, you can split your dataset into subsets that minimize the potential for bias in your evaluation and validation process.

In this course, you’ll learn:

Why you need to split your dataset in supervised machine learning
Which subsets of the dataset you need for an unbiased evaluation of your model
How to use train_test_split() to split your data
How to combine train_test_split() with prediction methods

In addition, you’ll get information on related tools from sklearn.model_selection.

What’s Included:

12 Lessons
Video Subtitles and Full Transcripts
1 Downloadable Resource
Accompanying Text-Based Tutorial
Interactive Quiz to Check Your Progress
Q&A With Python Experts: Ask a Question
Certificate of Completion

Downloadable Resources:

Course Slides (.pdf)

Related Learning Paths:

Machine Learning With Python

About Darren Jones

With 20 years as a teacher of music technology, Darren is keen to bring his skills to the Python table.

» More about Darren

Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are: