For more information about concepts covered in this lesson, you can check out:
A Larger Regression Example
When you work with larger datasets, it’s usually more convenient to pass the training or test size as a ratio.
test_size=0.4 means that approximately 40% of samples will be assigned to the test set, and the remaining 60% will be assigned to the training set.
You can use
.shape once more on the
x_test arrays to confirm their sizes, showing that the training set has
303 rows, and the test set,
203. Finally, you can use the training set
y_train to fit the model and the test set
y_test for an unbiased evaluation of the model.
The process is pretty much the same as with the previous example. Firstly, import the needed classes. Secondly, create and fit the model instances using the training set. And thirdly, evaluate model with
.score() using the test set.
You’ll see all three of these onscreen starting with linear regression. The first step is to import the
LinearRegression model. Next, create and train the model with the single line that chains the
.fit() method after the model is created.
As we’ve already seen, the
RandomForestRegressor use the
random_state parameter for the same reason as
train_test_split() does: to deal with randomness in the algorithms and ensure reproducibility.
You can use
train_test_split() to solve classification problems the same way you do for regression analysis. In machine learning, classification problems involve training a model to apply labels to or classify the input values and sort your dataset into categories.
06:03 In this Real Python tutorial, you’ll find an example of a handwriting recognition task. The example provides another demonstration of splitting data into training and test sets to avoid bias in the evaluation process.
Become a Member to join the conversation.