Splitting Datasets With scikit-learn and train_test_split() (Overview)

Splitting Datasets With scikit-learn and train_test_split() Darren Jones 01:04

One of the key aspects of supervised machine learning is model evaluation and validation. When you evaluate the predictive performance of your model, it’s essential that the process be unbiased. Using train_test_split() from the data science library scikit-learn, you can split your dataset into subsets that minimize the potential for bias in your evaluation and validation process.

In this course, you’ll learn:

Why you need to split your dataset in supervised machine learning
Which subsets of the dataset you need for an unbiased evaluation of your model
How to use train_test_split() to split your data
How to combine train_test_split() with prediction methods

In addition, you’ll get information on related tools from sklearn.model_selection.

Download

Course Slides (.pdf)

6.2 MB

00:00 Split your dataset with scikit-learn’s train_test_split(). One of the key aspects of supervised machine learning is model evaluation and validation.

00:11 When you evaluate the predictive performance of your model, it’s essential that the process is unbiased. Using train_test_split() from the data science library scikit-learn, you can split your dataset into subsets that minimize the potential for bias in your evaluation and validation process.

00:30 In this course, you’ll learn why you need to split your dataset in supervised machine learning, which subsets of the dataset you need for an unbiased evaluation of your model, how to use train_test_split() to split your data, and how to combine train_test_split() with prediction methods. In addition, you’ll get information on related tools from scikit-learn’s model_selection module.

00:56 Now that you’ve seen what you’ll learn in this course, let’s start off by taking a look at why data splitting is important.

Become a Member to join the conversation.