Resources mentioned in this lesson:
Querying Vector Databases
00:00
And now witness the power of the vector database. Make your first query to the DB. query_results equals collection.query(), passing in query_texts equals a string in a list that says, "Find me some delicious food!" And n_results equals one, meaning we only want one result for each query.
00:22
query_results is like a dictionary, so it has a .keys() method you can use. query_results.keys() returns a list of ids, embeddings, documents, uris, included, data, metadatas, and distances.
00:35
A lot of info. documents holds the matching documents. query_results["documents"] returns a document about Italian pizza. The ids key holds the IDs of the results. So query_results["ids"] returns the nested list with the string "id3", which corresponds to the document you just saw.
00:57
And then you can see the distances under query_results at "distances". You see a cosine distance of 0.76. And this can be a little tricky because now we’re looking at cosine distance, not cosine similarity.
01:11 Cosine distance is defined as one minus the cosine similarity. So smaller numbers mean closer vectors, meaning the cosine similarity for this would actually be about 0.24.
01:23
And you can see the metadata at the "metadatas" key. Genre, food. You can also pass in multiple queries at once as a list and ask for multiple results.
01:36
query_results equals calling collection.query(), passing in query_texts, a list of strings, "Teach me about history" and "What's going on in the world?"
01:48
include takes a list, and that tells ChromaDB what keys to include in the result set. And you’ll get two results for each query by setting n_results to two.
01:58 This is why the results you saw before were in nested lists.
02:03
query_results["documents"][0] returns the text about Einstein and then the text about the American Revolution. These are the top two results of the first query. And viewing distances for the same results shows 0.62 and 0.69.
02:21 Remember, these are distances. Smaller numbers, closer vectors.
02:27
query_results["documents"][1] gives you the top two results of the second query. And you can see their distances at query_results at "distances" index one.
02:38
Of course, vector similarity alone isn’t always enough. Look at this query. collection.query(), query_texts is the string, "Teach me about music history," and set n_results to one.
02:54 And among the data, you can see the result is about Einstein and pretty irrelevant. What’s great about ChromaDB is that you can use the metadata as a filter.
03:04
collection.query(), query_texts, "Teach me about music history," and now add a where parameter, a dict with the key "genre" and the value, another dict, with the key "$eq" and the value "music", asking for results where the genre is music.
03:25 This focuses only on the music genre and returns a more relevant result, even though the cosine distance is actually a little bit higher.
03:34
You can use a variety of filters. Change the filter and try the query again. query_results equals collection.query(),
03:43
again, "Teach me about music history,"
03:46
where the key "genre", this time "$in" the list "music", "history", with n_results set to two.
03:57 If you examine the documents returned, you get the text about Beethoven as well as the text about the American Revolution.
04:06
And you can see their distances are 0.81 and 0.82. And one last thing, you can modify existing collections as well. collection.update(), ids equals the entries you want to update, "id1" and "id2" in this case, documents, the list that will replace the IDs that you’re updating, "The new iPhone is awesome!" and "Bali has beautiful beaches," and the metadatas will be updated as well.
04:35 First, genre tech and then genre beaches.
04:40
Examine the updated records by calling the .get() method of collection. query_results equals collection.get(), ids equals "id1", "id2", and see the new documents.
04:53
"The new iPhone is awesome!" and "Bali has beautiful beaches." View the updated metadatas, genres of tech and beaches.
05:02
So that’s creation, reading, updating, and what’s left? Right, deleting. You can delete in the same way. collection.delete(), ids equals the list of strings, "id1", "id2".
05:17
Check the count on the collection by calling collection.count(), eight records, and verify that those records are gone. collection.get(), "id1", "id2" returns no records.
05:29 All right, now that you’ve had some ChromaDB practice, it’s project time. Join me in the next lesson to get started.
Become a Member to join the conversation.
