Implementing Vectors

Joseph Peart

Vector Databases and Embeddings With ChromaDB Joseph Peart 03:35

Resource mentioned in this lesson: NumPy Tutorial: Your First Steps Into Data Science in Python

00:00 Start by importing numpy as np, which is the standard naming convention. And now using numpy, you’ll create two vectors as numpy arrays.

00:08 You do this by creating lists and passing them into the np.array constructor. v1 is based on the list 2, 2, and v2 is based on the list negative 2, 2.

00:20 These two vectors are orthogonal, aka perpendicular, meaning their interior angle is 90 degrees. And why use numpy? Well, it provides loads of numerical operations.

00:31 I mean, it’s numerical Python, after all. You can see the dimensionality of a vector by accessing its .shape attribute. v1.shape returns a tuple, 2, because it’s a two-dimensional vector.

00:44 And you can perform element-wise operations using built-in Python operators. For example, element-wise multiplication between two vectors is as simple as v1 times v2, which returns the array negative 4, 4, which is 2 times negative 2 and 2 times 2.

01:00 You can square every element in the array with the power operator.

01:05 v1 to the power of 2 returns 2 squared and 2 squared, i.e. 4, 4. And this means you can get the magnitude of a vector slightly easier.

01:15 Use numpy’s square root function as well as numpy’s sum() function to get the square root of the sums of the squares of v1, which is about 2.8, which is the square root of 8.

01:27 But even that’s a bit much, so instead, you can actually use the norm() function from the linalg, linear algebra, submodule of numpy.

01:36 np.linalg.norm() passing in v1 also returns 2.8. And specifically, this calculates the Euclidean norm, which is generally the most commonly used measurement of vector magnitude.

01:48 Now, confirm that these two vectors have the same size, passing v2 into the same function.

01:55 You also get 2.8. Perfect. And since you can use numpy to multiply the elements of two vectors together, like you saw, v1 times v2 returning minus 4, 4, you can get the dot product of two vectors manually, like so.

02:10 np.sum() v1 times v2. Since they’re orthogonal, the result is 0. But naturally, numpy provides a function for this as well, np.dot(). So np.dot() passing in v1, v2 also returns 0.

02:25 In fact, the dot product is such a common operation, they even have an operator for it, the @ symbol. v1 @ v2 returns 0. So, do we also know the cosine similarity of these two vectors?

02:38 Actually, it would be 0, since the dot product is 0, and that’s the numerator of our cosine similarity formula.

02:44 So, create one more vector at 0, 2. v3 equals np.array 0, 2. Now you can use numpy to find out the cosine similarity of v1 and v3.

02:56 Remember how? First, v1 @ v3 provides the dot product. Wrap that in parentheses and then divide it by the product of the norms of v1 and v3.

03:10 And the result is 0.7. Alright, the mathy math parts are behind you. Congrats! Even if it feels like some of this went over your head, don’t worry. It takes time to build intuition.

03:23 If you can follow the code, that’s all you need. And you’ll be using the same formula in the cosine similarity function you’ll use to explore embeddings in the coming lessons.

03:31 Next up, text embeddings.

Become a Member to join the conversation.