Other Validation Functionalities
00:00 Other validation functionalities.
00:04
scikit-learn’s model_selection
module offers a lot of functionalities related to model selection and validation, including the following: cross-validation, learning curves, and hyperparameter tuning.
00:19 Cross-validation is a set of techniques that combine the measures of prediction performance to get a more accurate model estimations. One of the most widely used cross validation methods is k-fold cross-validation. In it, you divide your dataset into k—often five or ten subsets—or folds, of equal size and then perform the training and test procedures k times. Each time, you use a different fold as the test set and all of the remaining folds as the training set.
00:51 This provides k measures of predictive performance, and you can then analyze their mean and standard deviation.
01:00
You can implement cross validation with KFold
, StratifiedKFold
, LeaveOneOut
, and a few other classes and functions from scikit-learn’s model_selection
module.
01:13 A learning curve, sometimes called a training curve, shows how the prediction score of training and validation sets depends on the number of training samples.
01:22
You can use learning_curve()
to get this dependency, which can help you find the optimal size of the training set, choose hyperparameters, compare models, and so on.
01:33
Hyperparameter tuning, also called hyperparameter optimization, is the process of determining the best set of hyperparameters to define your machine learning model. scikit-learn’s model_selection
module provides you with several options for this purpose, including GridSearchCV
, RandomizedSearchCV
, validation_curve()
, and others.
01:56 Splitting your data is also important for hyperparameter tuning. Now that you’ve covered all the elements of this course, let’s take some time to look back at what you’ve learned.
Become a Member to join the conversation.