For more information about concepts covered in this lesson, you can check out:
The Importance of Data Splitting
00:00 The importance of data splitting. Supervised machine learning is about creating models that precisely map the given inputs—independent variables, or predictors—to the given outputs—dependent variables, or responses. How you measure the precision of your model depends on the type of problem you’re trying to solve.
01:01 This means that you can’t evaluate the predictive performance of a model with the same data that you’ve used for training. You need to evaluate the model with fresh data that hasn’t been seen by the model before. You can accomplish that by splitting your dataset before you use it.
01:30 The training set is applied to train, or fit, your model. For example, you use the training set to find the optimal weights or coefficients for linear regression, logistic regression, or neural networks.
01:45 The validation set is used for unbiased model evaluation during hyperparameter tuning. For example, when you want to find the optimal number of neurons in a neural network or the best kernel for a support vector machine, you experiment with different values. For each considered setting of hyperparameters, you fit the model with the training set and assess its performance with the validation set.
02:19 In less complex cases where you don’t have to tune hyperparameters, it’s okay to work with only the training and test sets. Splitting a dataset might also be important for detecting if your model suffers from one of two very common problems called underfitting and overfitting. Underfitting is usually the consequence of a model being unable to encapsulate the relations among data. For example, this can happen when trying to represent nonlinear relations with a linear model.
02:56 Overfitting usually takes place when a model has an excessively complex structure and learns both the existing relations amongst data and noise. Such models often have bad generalization capabilities.
Become a Member to join the conversation.