Join us and get access to hundreds of tutorials and a community of expert Pythonistas.

Unlock This Lesson

This lesson is for members only. Join us and get access to hundreds of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Hint: You can adjust the default video playback speed in your account settings.
Hint: You can set the default subtitles language in your account settings.
Sorry! Looks like there’s an issue with video playback 🙁 This might be due to a temporary outage or because of a configuration issue with your browser. Please see our video player troubleshooting guide to resolve the issue.

Splitting Datasets With scikit-learn and train_test_split() (Summary)

Give Feedback

You now know why and how to use train_test_split() from sklearn. You’ve learned that, for an unbiased estimation of the predictive performance of machine learning models, you should use data that hasn’t been used for model fitting. That’s why you need to split your dataset into training, test, and in some cases, validation subsets.

In this course, you’ve learned how to:

  • Use train_test_split() to get training and test sets
  • Control the size of the subsets with the parameters train_size and test_size
  • Determine the randomness of your splits with the random_state parameter
  • Obtain stratified splits with the stratify parameter
  • Use train_test_split() as a part of supervised machine learning procedures

You’ve also seen that the sklearn.model_selection module offers several other tools for model validation, including cross-validation, learning curves, and hyperparameter tuning.

Download

Course Slides (.pdf)

6.2 MB

aniketbarphe on Sept. 4, 2021

Dear Team, Thank You very much for such a wonderful session. All topics were explained nicely. Only suggestion is with topic name “Other Validation Functionalities” is, it will be helpful if this is explained with example. With the help of example it is easy to understand rather than theory. Looking forward for positive response.

Become a Member to join the conversation.