Convolutional Neural Networks (CNN)
00:00 A better approach might be to utilize a special type of neural network known as a convolutional neural network, or CNN. While CNNs are generally used for image classification and computer vision, they are also handy for text processing, as both image and text data involves sequences. A CNN is distinguished from the neural networks you have built by the addition of a convolutional layer. Inside of the convolutional layer, a filter—or kernel—analyzes the data in pieces while still maintaining the spatial relationship between the data. In images, these filters are two-dimensional, but with text you only need a one-dimensional convolutional layer.
00:44 Here’s an illustration of a convolutional layer in action. Your text will be the input features and the filter will be applied to the first group of features. The output will be stored in a convolution.
01:42 You’ll see that 80% is about the best it can do. Keep in mind that this is a small data set and that neural networks perform better with large data sets. However, there is one more technique that you can employ to improve performance.
01:58 You’ve actually already been exposed to this technique, called hyperparameter optimization, but without the optimization. To review, most of the emphasis so far has been on training the weights of the model.
02:21 These values that are not trained are called hyperparameters, and they still have a lot of influence on the performance of the model. Optimizing these values yourself is impractical, so you can ask Keras for help.
Keras will try different combinations of hyperparameters and tell you the best one. For this course, you’ll see how to use the scikit-learn utility
RandomizedSearchCV, in addition to k-fold cross-validation, to find the best set of values for the hyperparameters. k-fold cross-validation partitions the data into chunks.
02:57 The number of chunks is the value of k. Here, you can see an example of 5-fold cross-validation. On each iteration, a different chunk—or fold—will be used for testing with the remainder used for training, so the iterations will use different combinations of training and testing data.
Specifically, you’ll use a grid search, which maintains the combinations of the parameters. The grid is defined as a dictionary, as seen here. The parameters in the search will be the number of filters in the convolutional layer, the size of the kernel or window, the size of the vocabulary, the number of embedding dimensions, and the maximum length of the feature vectors, or sentences. To implement the search, the
RandomizedSearchCV class requires an instance of
KerasClassifier, which wraps the model in a scikit-learn wrapper.
You’ve seen all the code in this function, so I won’t go over it again. You’ve also seen a vast majority of the training code. The first difference is the instance of the
KerasClassifier that wraps the model, and this is where you set the keyword arguments that were provided to the
.fit() method of the model in the previous exercises.
The return value from
.fit() will include the best score and the parameters producing that score so you can examine them. This can take a little while, so again, I’ll speed it up through the magic of video. Also, if you run this on Google Colab, you’ll have access to free GPUs for jobs up to 12 hours.
05:15 A better approach, but still not much more than 80%. A logical conclusion is that a larger data set would yield better results. Again, convolutional neural networks and neural networks in general are intended for large data sets. Let’s wrap up this course and review what you saw.
Become a Member to join the conversation.