Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Hint: You can adjust the default video playback speed in your account settings.
Hint: You can set your subtitle preferences in your account settings.
Sorry! Looks like there’s an issue with video playback 🙁 This might be due to a temporary outage or because of a configuration issue with your browser. Please refer to our video player troubleshooting guide for assistance.

Applying the Chain Rule

00:00 To review, your network has two layers. For example, the error function is x squared, but x is the result of another function: the difference of the prediction and expected value, as seen in the previous lesson.

00:15 When the input of one function is the result of another, it’s called function composition. In the previous lesson, you saw how to use the derivative to reduce the error function.

00:25 But since the error function composes other functions, you must use the chain rule to take the derivative and reduce the error. When using the chain rule, you take the partial derivative of each function and then multiply them.

00:40 This isn’t as complex as it might sound for the simple functions in this course. It’s easier to see in a diagram. To take the derivative of the error function, you’ll need partial derivatives.

00:53 Take the partial derivative of the error with respect to the prediction, then take the partial derivative of the prediction with respect to the first layer, and finally, the partial derivative of the first layer with respect to the weights.

01:08 Then take all the partial derivatives and multiply them together. This product will give you the derivative of the error with respect to the weights. Perhaps you noticed that you were starting at the error and working backwards to the weights. This is called a backward pass and the algorithm is called backpropagation.

01:29 Here’s how to update the bias. It’s the same algorithms for the weights, just different variables. So instead of taking the derivative of the error with respect to the weights, you take the derivative of the error with respect to the bias, as seen in this diagram.

01:45 The error function is x squared, and the derivative, as you’ve seen, is 2x. For the next partial derivative, you’ll take a step in reverse and compute the partial derivative of the prediction with respect to the layer.

01:58 This is the derivative of the sigmoid function. For this course, just accept that it is the product of the sigmoid and the difference of 1 and the sigmoid. Finally, you can take the partial derivative of the layer with respect to the bias.

02:14 If you multiply them, you’ll get the derivative of the error with respect to the bias, and you’ll use this value to update the bias to reduce the error.

02:25 Here’s what it looks like in Python. First, this function computes the derivative of the sigmoid function.

02:33 The partial error of the derivative with respect to the prediction is 2x, but x is the difference of the prediction and the target. The derivative of the prediction with respect to the layer is the derivative of the sigmoid, which accepts the output of the layer. And the derivative of the layer with respect to the bias is the constant 1.

02:54 Multiply them together for the derivative of the error with respect to the bias and subtract that from the bias. And, of course, do the same with the weights and the derivative of the error with respect to the weights. You’ll implement that in the next lesson as you write a class to build a neural network.

Ahmed Ahmed Elnaghy on Feb. 24, 2023

i did not got it, i need more explain about backpropagation or chain rule, i mean step by step while doing it by hand or by example, thanks for your kindness

Become a Member to join the conversation.