Your First Keras Model
00:00 Before building the neural network in Keras, reconstruct the training and test data sets using the Yelp source.
00:08 You might not have heard of Keras, but you have probably heard of TensorFlow. Originally, Keras was a wrapper around the lower-level API of TensorFlow and also worked with several other backends. Today in TensorFlow 2.0, Keras is part of TensorFlow.
00:25 Many of the APIs are still the same, you just get to them through the TensorFlow modules. TensorFlow 2.0 is pre-installed with Google Colab, so all you need to do to start using it is import Keras from the appropriate modules.
00:40 Recall that in the previous video you saw models, which are composed of layers. Keras represents the model and layers as objects in the Keras modules, so you’ll create a model and then you’ll add layers to it.
00:55
The type of model you will create is a sequential model. Its name implies a sequence—in this case, of layers—and that’s exactly what you will do: create a sequence of layers, starting with the input layer. First, create a Sequential
model.
01:11
Then call the .add()
method and create a new Dense
layer. A dense layer—also referred to as a fully-connected layer—receives every output from the previous layer and sends those to every input—except for, of course, the input layer.
01:28
The layer has a size, or number of units—10
, in this case. So this layer will have 10 values. Don’t worry about where the number 10
comes from, just focus on the higher-level concepts.
01:41 Since this is the first layer, you’ll need to provide the number of inputs for the layer to expect from the vectors.
01:48 Recall the training data for the Yelp data source has 750 vectors, each of length 1,714. The size of the input layer will be the same as the length of the vectors.
02:01
Set the input_dim
keyword argument to the second value in the .shape
of the training data.
02:07
And finally, set the activation
keyword argument to 'relu'
.
02:12
Next, add another Dense
layer for the output layer. The size of the layer is 1
and the activation function is 'sigmoid'
.
02:22
Call the .summary()
method on the model to see an overview of the layers. The total number of trainable parameters is 17,161. Now, where do these numbers come from?
02:35 Recall that there are 10 nodes in the first layer times the 1,714 values in the input layer for a total of 17,140 plus the 10 bias values for a total of 17,150.
02:51 The second layer receives the output of the previous layer, which is 10 values, plus the single value in the layer for a total of 11.
03:00 So that means there’s going to be 17,161 values for the entire model.
03:07
The next step is to call the .compile()
method on the model, and this is where you will set the loss function and optimizer. Use the loss
keyword argument to set the loss function to 'binary_crossentropy'
and the optimizer
keyword argument to set the optimizer to 'adam'
. Again, don’t worry about where these values come from.
03:25
Just focus on the high-level steps. The remaining keyword argument, metrics
, is a convenience in Keras that collects statistics during the training process.
03:37
At last you can train the model. Call the .fit()
method on the model. This method accepts the training vectors and labels. There are a number of keyword arguments that represent hyperparameters, such as epochs
.
03:50 An epoch is a complete iteration through the training data. Recall that the first predictions are going to be rather poor, until the model begins to zero in on the correct weights. After the first iteration through the training data, the model deserves a second chance at improving those poor predictions—and a third, and a fourth. In this example, it’ll get 100 chances.
04:11
Now, epochs
is a hyperparameter because it is a non-trainable value. You set it, and it doesn’t change while training. Also, you can use the testing data set as validation data to watch the accuracy of the data set improve while training. Execute this cell and enjoy the show.
04:31 Actually, things happen rather fast with this data set, but notice that while the model was training the loss decreased and the accuracy increased. The validation loss started to increase about halfway through training, and that could mean that the model started to overfit. More on this in a second.
04:51
However, the accuracy overall is still better than when it started. To test the model, call the .evaluate()
method. First test on the training data, and then on the test data.
05:03
The .evaluate()
method returns the loss and accuracy scores, but you’ll just use the accuracy in this experiment. Notice that the training accuracy is 100%, but the testing accuracy is just under 80%.
05:17
This adds to the argument that the model overfitted, as it performs much better on the data set used to train it, but not so well on new data. You can see the results of overfitting using this little function called plot_history()
.
05:31
Recall that you saved the return value of the .fit()
method when training the model. The return value has a .history
property, which is a dictionary with keys for the training accuracy and loss and the validation accuracy and loss at the end of each epoch. There were 100 epochs, so you’ll have 100 values for each. Pass the history
object to the plot_history()
function, and you’ll get these graphs.
05:55 Notice that the training accuracy reaches 100% long before the hundredth epoch and that the validation loss begins to increase about the same time the training accuracy reaches 100%.
06:07 Therefore, there is no reason to train this model more than about 25 or 30 epochs. More training is not always the way to an accurate model.
06:18 Note that this is a very small data set, a few thousand operations, and real-world data sets would be millions or even billions of observations, so this example does not really require a neural network. In fact, you could get better results without a neural network because neural networks are better suited for very large amounts of data.
06:36 The smaller data set also has another disadvantage when it comes to testing and validation. In this example they are the same, but in the real world you would partition your data set into training, testing, and validation data sets. However, you are making the trade-off to work with a model that doesn’t take hours to train. There is more than one way to represent a corpus.
06:58 In the next video, you will see an alternative method to the bag-of-words model.
Become a Member to join the conversation.