Word Vectors and Embedding Layers
00:00 Let’s take a look at another way to represent the words. You’ve seen the bag-of-words model that represents a sequence of words as a vector. However, next you’ll see how to represent each word as a vector.
One way to represent a word as a vector is with one-hot encoding. Take a look at the following code. It simply creates a list of strings. The
LabelEncoder from scikit-learn will create a unique integer value for each member of the list, and you do this by calling the
You can use this to create one-hot encodings for each string with the
OneHotEncoder in scikit-learn. Notice that the
OneHotEncoder needs the labels to be in a column, not in a row. The result is a NumPy array.
Again, there is an entry for each string or label. This time, the labels are represented as a vector. The vector is the same length as the vocabulary—3, in this sample. In a vector, all the values are
0 except for a single
01:25 Next, you’ll look at the concept of an embedding layer for more efficient representations of text. The text needs to be prepared—again, up to 80% of machine learning is data preparation—to work with the embeddings.
Give it the text to pad, where to pad—
'post' will pad at the end of the text—and the maximum length of the padded sequences. To create the embedding layer, you can use a pretrained model. You’ll do that later, but first, you’ll train a custom layer. Keras provides more utility classes to help out. Create a new
Sequential model and add an
03:57 Test the model and graph the history. Again, the accuracy is too low and the errors are too high, so let’s look at a better way to train this model. The problem is that you want to consider the order of the values in the vectors instead of each individual value alone, and one way to handle this is to use a pooling layer. In this case, you’ll add a global pooling layer after the embedding layer in the model.
Inside of the pooling layer, the maximum values of all values in each dimension will be selected. There are also average pooling layers as well. The max pooling layer will highlight large values. Add the
GlobalMaxPool1D layer to the model and train it again.
04:46 Now this is much more accurate, but training a real-world embedding layer would take a lot more time and attention. In the next video, you’ll see how to save time by using a pretrained word embedding.
Become a Member to join the conversation.