Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Hint: You can adjust the default video playback speed in your account settings.
Hint: You can set your subtitle preferences in your account settings.
Sorry! Looks like there’s an issue with video playback 🙁 This might be due to a temporary outage or because of a configuration issue with your browser. Please refer to our video player troubleshooting guide for assistance.

Other Validation Functionalities

00:00 Other validation functionalities.

00:04 scikit-learn’s model_selection module offers a lot of functionalities related to model selection and validation, including the following: cross-validation, learning curves, and hyperparameter tuning.

00:19 Cross-validation is a set of techniques that combine the measures of prediction performance to get a more accurate model estimations. One of the most widely used cross validation methods is k-fold cross-validation. In it, you divide your dataset into k—often five or ten subsets—or folds, of equal size and then perform the training and test procedures k times. Each time, you use a different fold as the test set and all of the remaining folds as the training set.

00:51 This provides k measures of predictive performance, and you can then analyze their mean and standard deviation.

01:00 You can implement cross validation with KFold, StratifiedKFold, LeaveOneOut, and a few other classes and functions from scikit-learn’s model_selection module.

01:13 A learning curve, sometimes called a training curve, shows how the prediction score of training and validation sets depends on the number of training samples.

01:22 You can use learning_curve() to get this dependency, which can help you find the optimal size of the training set, choose hyperparameters, compare models, and so on.

01:33 Hyperparameter tuning, also called hyperparameter optimization, is the process of determining the best set of hyperparameters to define your machine learning model. scikit-learn’s model_selection module provides you with several options for this purpose, including GridSearchCV, RandomizedSearchCV, validation_curve(), and others.

01:56 Splitting your data is also important for hyperparameter tuning. Now that you’ve covered all the elements of this course, let’s take some time to look back at what you’ve learned.

Become a Member to join the conversation.