Reducing Prediction Error
00:12 The mechanism that computes the error is called a cost or loss function. For this course, you’ll use the simple mean squared error, or MSE. This is done by taking the difference of the prediction and target and squaring it. And doing this in Python is not difficult.
00:49 Now you need to reduce the error. Consider only the weights for now and then tackle the bias later. The following graph shows two possible errors. They are on the graph of the quadratic function because you are squaring the error.
01:03 The error value is on the y-axis. To reduce the error and make it closer to zero, you need to reduce the value of x for point A and you need to increase the value of x for point B. Like with vectors earlier, it’s easy to see this in a graph, but how do you do it in Python?
01:21 You’ll use the derivative of the cost function to determine the direction to adjust the values. The derivative is also called the gradient. And finally, gradient descent is the name of the algorithm used to adjust the weights.
01:37 To keep things simple, you’ll be able to use the power rule from calculus to compute the derivative of the cost function. The power rule says that for a formula in the form x to the n power, the derivative is n times x to the n-1 power.
Thus the derivative of the cost function is two times the difference of the prediction and target. The Python code for this is simple, and you’ll see that the result is
1.74. Thus, you should decrease the weights to reduce the error.
Subtract the derivatives from the weights. You can do this easily using broadcasting with NumPy and then call
make_prediction() with the new weights. This time, the prediction is below
0.5. Also, the error is much smaller.
02:42 It’s almost zero. Does this mean the problem is solved? Not always. In this case, everything worked out, but what if the derivative were a larger value? The error could skip over the minimum and actually increase.
03:26 The new weights work great with the second input vector. Applying the weights to the first input vector will give you an incorrect prediction. When a model performs significantly better on the training data than it does on test data, the model is said to be overfitted.
03:44 In other words, the model is too comfortable with the training data but will perform poorly with new data. There are several techniques to avoid this. In the next lesson, you’ll begin to implement one.
Become a Member to join the conversation.