Creating the NeuralNetwork Class
00:33 A hyperparameter is constant. It is set before training begins and does not change. It’s kind of like a configuration value for the training process. Also in the initializer, the weights and bias are initialized.
These are set to random values. Since the network in this course is so small, it probably won’t have much of an effect which distribution you use or even if the initial values are random. However, in a real-world scenario, this is standard practice. The code for the sigmoid and sigmoid derivative you’ve seen before and need no further discussion. The
.predict() method computes the values for each layer and returns a prediction. Again, you’ve seen this code as well.
01:18 The code to compute the gradients is a little different. Notice that when computing the partial derivative of the layer with respect to the weights, you must take the derivative of a dot product. To do this, take the derivative of the first vector and multiply it by the second. Then compute the product of the second vector and the derivative of the first.
Become a Member to join the conversation.