Other Validation Functionalities

00:00 Other validation functionalities.

00:04 scikit-learn’s model_selection module offers a lot of functionalities related to model selection and validation, including the following: cross-validation, learning curves, and hyperparameter tuning.

00:19 Cross-validation is a set of techniques that combine the measures of prediction performance to get a more accurate model estimations. One of the most widely used cross validation methods is k-fold cross-validation. In it, you divide your dataset into k—often five or ten subsets—or folds, of equal size and then perform the training and test procedures k times. Each time, you use a different fold as the test set and all of the remaining folds as the training set.

00:51 This provides k measures of predictive performance, and you can then analyze their mean and standard deviation.

01:00 You can implement cross validation with KFold, StratifiedKFold, LeaveOneOut, and a few other classes and functions from scikit-learn’s model_selection module.

01:13 A learning curve, sometimes called a training curve, shows how the prediction score of training and validation sets depends on the number of training samples.

01:22 You can use learning_curve() to get this dependency, which can help you find the optimal size of the training set, choose hyperparameters, compare models, and so on.

01:33 Hyperparameter tuning, also called hyperparameter optimization, is the process of determining the best set of hyperparameters to define your machine learning model. scikit-learn’s model_selection module provides you with several options for this purpose, including GridSearchCV, RandomizedSearchCV, validation_curve(), and others.

01:56 Splitting your data is also important for hyperparameter tuning. Now that you’ve covered all the elements of this course, let’s take some time to look back at what you’ve learned.

