Your First Keras Model
00:08 You might not have heard of Keras, but you have probably heard of TensorFlow. Originally, Keras was a wrapper around the lower-level API of TensorFlow and also worked with several other backends. Today in TensorFlow 2.0, Keras is part of TensorFlow.
00:25 Many of the APIs are still the same, you just get to them through the TensorFlow modules. TensorFlow 2.0 is pre-installed with Google Colab, so all you need to do to start using it is import Keras from the appropriate modules.
00:40 Recall that in the previous video you saw models, which are composed of layers. Keras represents the model and layers as objects in the Keras modules, so you’ll create a model and then you’ll add layers to it.
The type of model you will create is a sequential model. Its name implies a sequence—in this case, of layers—and that’s exactly what you will do: create a sequence of layers, starting with the input layer. First, create a
Then call the
.add() method and create a new
Dense layer. A dense layer—also referred to as a fully-connected layer—receives every output from the previous layer and sends those to every input—except for, of course, the input layer.
The next step is to call the
.compile() method on the model, and this is where you will set the loss function and optimizer. Use the
loss keyword argument to set the loss function to
'binary_crossentropy' and the
optimizer keyword argument to set the optimizer to
'adam'. Again, don’t worry about where these values come from.
At last you can train the model. Call the
.fit() method on the model. This method accepts the training vectors and labels. There are a number of keyword arguments that represent hyperparameters, such as
03:50 An epoch is a complete iteration through the training data. Recall that the first predictions are going to be rather poor, until the model begins to zero in on the correct weights. After the first iteration through the training data, the model deserves a second chance at improving those poor predictions—and a third, and a fourth. In this example, it’ll get 100 chances.
epochs is a hyperparameter because it is a non-trainable value. You set it, and it doesn’t change while training. Also, you can use the testing data set as validation data to watch the accuracy of the data set improve while training. Execute this cell and enjoy the show.
04:31 Actually, things happen rather fast with this data set, but notice that while the model was training the loss decreased and the accuracy increased. The validation loss started to increase about halfway through training, and that could mean that the model started to overfit. More on this in a second.
.evaluate() method returns the loss and accuracy scores, but you’ll just use the accuracy in this experiment. Notice that the training accuracy is 100%, but the testing accuracy is just under 80%.
This adds to the argument that the model overfitted, as it performs much better on the data set used to train it, but not so well on new data. You can see the results of overfitting using this little function called
Recall that you saved the return value of the
.fit() method when training the model. The return value has a
.history property, which is a dictionary with keys for the training accuracy and loss and the validation accuracy and loss at the end of each epoch. There were 100 epochs, so you’ll have 100 values for each. Pass the
history object to the
plot_history() function, and you’ll get these graphs.
06:18 Note that this is a very small data set, a few thousand operations, and real-world data sets would be millions or even billions of observations, so this example does not really require a neural network. In fact, you could get better results without a neural network because neural networks are better suited for very large amounts of data.
06:36 The smaller data set also has another disadvantage when it comes to testing and validation. In this example they are the same, but in the real world you would partition your data set into training, testing, and validation data sets. However, you are making the trade-off to work with a model that doesn’t take hours to train. There is more than one way to represent a corpus.
Become a Member to join the conversation.