Convolutional Neural Networks (CNN)
00:00 A better approach might be to utilize a special type of neural network known as a convolutional neural network, or CNN. While CNNs are generally used for image classification and computer vision, they are also handy for text processing, as both image and text data involves sequences. A CNN is distinguished from the neural networks you have built by the addition of a convolutional layer. Inside of the convolutional layer, a filter—or kernel—analyzes the data in pieces while still maintaining the spatial relationship between the data. In images, these filters are two-dimensional, but with text you only need a one-dimensional convolutional layer.
00:44 Here’s an illustration of a convolutional layer in action. Your text will be the input features and the filter will be applied to the first group of features. The output will be stored in a convolution.
00:57 The filter will slide a predetermined distance to the next group of features and produce an output. Notice that the groups are overlapping.
And that’s all you really need to know about how convolutional layers work because Keras abstracts the math for you. Add a
Conv1D layer right before the pooling layer.
Set the number of filters to
128 and the size of the filter to
5, and then set the activation function to
01:27 Notice that the embedding layer is no longer using the GloVe matrix. The rest of the process is unchanged, so train the model and test it.
01:42 You’ll see that 80% is about the best it can do. Keep in mind that this is a small data set and that neural networks perform better with large data sets. However, there is one more technique that you can employ to improve performance.
01:58 You’ve actually already been exposed to this technique, called hyperparameter optimization, but without the optimization. To review, most of the emphasis so far has been on training the weights of the model.
02:11 You’ve seen those values are changed through the training process, but there are also values such as the vector length and the embedding size that are set before training.
02:21 These values that are not trained are called hyperparameters, and they still have a lot of influence on the performance of the model. Optimizing these values yourself is impractical, so you can ask Keras for help.
Keras will try different combinations of hyperparameters and tell you the best one. For this course, you’ll see how to use the scikit-learn utility
RandomizedSearchCV, in addition to k-fold cross-validation, to find the best set of values for the hyperparameters. k-fold cross-validation partitions the data into chunks.
02:57 The number of chunks is the value of k. Here, you can see an example of 5-fold cross-validation. On each iteration, a different chunk—or fold—will be used for testing with the remainder used for training, so the iterations will use different combinations of training and testing data.
Specifically, you’ll use a grid search, which maintains the combinations of the parameters. The grid is defined as a dictionary, as seen here. The parameters in the search will be the number of filters in the convolutional layer, the size of the kernel or window, the size of the vocabulary, the number of embedding dimensions, and the maximum length of the feature vectors, or sentences. To implement the search, the
RandomizedSearchCV class requires an instance of
KerasClassifier, which wraps the model in a scikit-learn wrapper.
RandomizedSearchCV class is from scikit-learn,
KerasClassifier is an adapter. This class requires the model be returned from a function.
You’ve seen all the code in this function, so I won’t go over it again. You’ve also seen a vast majority of the training code. The first difference is the instance of the
KerasClassifier that wraps the model, and this is where you set the keyword arguments that were provided to the
.fit() method of the model in the previous exercises.
RandomizedSearchCV class accepts the
KerasClassifier and the grid. The
cv keyword argument is the number of folds, or the k value.
.fit() on the
RandomizedSearchCV to start the heavy lifting.
The return value from
.fit() will include the best score and the parameters producing that score so you can examine them. This can take a little while, so again, I’ll speed it up through the magic of video. Also, if you run this on Google Colab, you’ll have access to free GPUs for jobs up to 12 hours.
05:03 These GPUs aren’t the fastest, but they will accelerate this code and take much less time. The final output is stored in a file that I’ve opened on the right side.
05:15 A better approach, but still not much more than 80%. A logical conclusion is that a larger data set would yield better results. Again, convolutional neural networks and neural networks in general are intended for large data sets. Let’s wrap up this course and review what you saw.
Become a Member to join the conversation.
Alam Ahmad on Sept. 11, 2022