**Hint:**You can adjust the default video playback speed in your account settings.

**Hint:**You can set your subtitle preferences in your account settings.

**Sorry!**Looks like there’s an issue with video playback 🙁 This might be due to a temporary outage or because of a configuration issue with your browser. Please see our video player troubleshooting guide to resolve the issue.

# Reducing Prediction Error

**00:00**
In this lesson, you’ll complete the training loop by evaluating the error between the prediction and the target and adjusting the weights in the network layers. First, compute the error.

**00:12**
The mechanism that computes the error is called a cost or loss function. For this course, you’ll use the simple mean squared error, or MSE. This is done by taking the difference of the prediction and target and squaring it. And doing this in Python is not difficult.

**00:32**
The error is just over `0.75`

.

**00:35**
Notice that by squaring the error, it will eliminate all negative values. Also, it will emphasize the larger errors, and the network will learn more from those larger errors.

**00:49**
Now you need to reduce the error. Consider only the weights for now and then tackle the bias later. The following graph shows two possible errors. They are on the graph of the quadratic function because you are squaring the error.

**01:03**
The error value is on the *y*-axis. To reduce the error and make it closer to zero, you need to reduce the value of *x* for point *A* and you need to increase the value of *x* for point *B*. Like with vectors earlier, it’s easy to see this in a graph, but how do you do it in Python?

**01:21**
You’ll use the derivative of the cost function to determine the direction to adjust the values. The derivative is also called the gradient. And finally, gradient descent is the name of the algorithm used to adjust the weights.

**01:37**
To keep things simple, you’ll be able to use the power rule from calculus to compute the derivative of the cost function. The power rule says that for a formula in the form *x* to the *n* power, the derivative is *n* times *x* to the *n*-1 power.

**01:52**
Thus, the derivative of *x* squared is 2 times *x*.

**01:58**
The error is computed by `np.square()`

and passing it the prediction minus the target. Therefore, consider the prediction minus the target to be a single variable.

**02:08**
Thus the derivative of the cost function is two times the difference of the prediction and target. The Python code for this is simple, and you’ll see that the result is `1.74`

. Thus, you should decrease the weights to reduce the error.

**02:25**
Subtract the derivatives from the weights. You can do this easily using broadcasting with NumPy and then call `make_prediction()`

with the new weights. This time, the prediction is below `0.5`

. Also, the error is much smaller.

**02:42**
It’s almost zero. Does this mean the problem is solved? Not always. In this case, everything worked out, but what if the derivative were a larger value? The error could skip over the minimum and actually increase.

**02:58**
Therefore, you will often use a fraction of the derivative to take smaller steps in reducing the error. This fraction is called the alpha parameter, or learning rate.

**03:08**
Smaller learning rates take smaller steps and vice versa. Often, values for learning rates start off with `0.1`

, `0.01`

, or `0.001`

, and the only way to find the optimal value is experimentation.

**03:26**
The new weights work great with the second input vector. Applying the weights to the first input vector will give you an incorrect prediction. When a model performs significantly better on the training data than it does on test data, the model is said to be overfitted.

**03:44**
In other words, the model is too comfortable with the training data but will perform poorly with new data. There are several techniques to avoid this. In the next lesson, you’ll begin to implement one.

Become a Member to join the conversation.