Creating the NeuralNetwork Class
00:00 Math class is over. In this lesson, you’ll write a class to build a neural network and train a model. A lot of the code you’ve already seen, so we don’t need to go through it line by line.
00:12
The class is named NeuralNetwork
and the initializer accepts the learning rate. Again, this is the value which is the fraction of the derivative used to adjust the weights or bias.
00:23 It’s called a hyperparameter. The weights and bias are trainable parameters. In other words, they are variables that are updated as the model is trained.
00:33 A hyperparameter is constant. It is set before training begins and does not change. It’s kind of like a configuration value for the training process. Also in the initializer, the weights and bias are initialized.
00:48
These are set to random values. Since the network in this course is so small, it probably won’t have much of an effect which distribution you use or even if the initial values are random. However, in a real-world scenario, this is standard practice. The code for the sigmoid and sigmoid derivative you’ve seen before and need no further discussion. The .predict()
method computes the values for each layer and returns a prediction. Again, you’ve seen this code as well.
01:18 The code to compute the gradients is a little different. Notice that when computing the partial derivative of the layer with respect to the weights, you must take the derivative of a dot product. To do this, take the derivative of the first vector and multiply it by the second. Then compute the product of the second vector and the derivative of the first.
01:39 Add the two products and get the derivative of the dot product.
01:43
The ._update_parameters()
method is where you update the weights and the bias using the derivative, or gradients, returned by ._compute_gradients()
.
01:51 Each derivative is multiplied by the learning rate. Then reduce the bias and weights to get better predictions.
02:02
To use the NeuralNetwork
class, initialize it with a learning rate. Again, 0.1
is a common choice. Call the .predict()
method with an input vector and you’ll get a prediction.
02:14 But this network merely makes a prediction. It has not been trained. In the next lesson, you’ll actually train a model.
Become a Member to join the conversation.