You now know why and how to use
sklearn. You’ve learned that, for an unbiased estimation of the predictive performance of machine learning models, you should use data that hasn’t been used for model fitting. That’s why you need to split your dataset into training, test, and in some cases, validation subsets.
In this course, you’ve learned how to:
train_test_split()to get training and test sets
- Control the size of the subsets with the parameters
- Determine the randomness of your splits with the
- Obtain stratified splits with the
train_test_split()as a part of supervised machine learning procedures
You’ve also seen that the
sklearn.model_selection module offers several other tools for model validation, including cross-validation, learning curves, and hyperparameter tuning.
Congratulations, you made it to the end of the course! What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment in the discussion section and let us know.