You now know why and how to use train_test_split()
from sklearn
. You’ve learned that, for an unbiased estimation of the predictive performance of machine learning models, you should use data that hasn’t been used for model fitting. That’s why you need to split your dataset into training, test, and in some cases, validation subsets.
In this course, you’ve learned how to:
- Use
train_test_split()
to get training and test sets - Control the size of the subsets with the parameters
train_size
andtest_size
- Determine the randomness of your splits with the
random_state
parameter - Obtain stratified splits with the
stratify
parameter - Use
train_test_split()
as a part of supervised machine learning procedures
You’ve also seen that the sklearn.model_selection
module offers several other tools for model validation, including cross-validation, learning curves, and hyperparameter tuning.
Congratulations, you made it to the end of the course! What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment in the discussion section and let us know.
aniketbarphe on Sept. 4, 2021
Dear Team, Thank You very much for such a wonderful session. All topics were explained nicely. Only suggestion is with topic name “Other Validation Functionalities” is, it will be helpful if this is explained with example. With the help of example it is easy to understand rather than theory. Looking forward for positive response.