Loading video player…

Vector Databases and Embeddings With ChromaDB (Overview)

The era of large language models (LLMs) is here, bringing with it rapidly evolving libraries like ChromaDB that help augment LLM applications. You’ve most likely heard of chatbots like OpenAI’s ChatGPT, and perhaps you’ve even experienced their remarkable ability to reason about natural language processing (NLP) problems.

Modern LLMs, while imperfect, can accurately solve a wide range of problems and provide correct answers to many questions. But, due to the limits of their training and the number of text tokens they can process, LLMs aren’t a silver bullet for all tasks.

You wouldn’t expect an LLM to provide relevant responses about topics that don’t appear in their training data. For example, if you asked ChatGPT to summarize information in confidential company documents, then you’d be out of luck. You could show some of these documents to ChatGPT, but there’s a limited number of documents that you can upload before you exceed ChatGPT’s maximum number of tokens. How would you select documents to show ChatGPT?

To address these shortcomings and scale your LLM applications, one great option is to use a vector database like ChromaDB. A vector database allows you to store encoded unstructured objects, like text, as lists of numbers that you can compare to one another. You can, for example, find a collection of documents relevant to a question that you want an LLM to answer.

In this video course, you’ll learn about:

  • Representing unstructured objects with vectors
  • Using word and text embeddings in Python
  • Harnessing the power of vector databases
  • Encoding and querying over documents with ChromaDB
  • Providing context to LLMs like ChatGPT with ChromaDB

After watching, you’ll have the foundational knowledge to use ChromaDB in your NLP or LLM applications. Before watching, you should be comfortable with the basics of Python and high school math.

Download

Course Slides (.pdf)

2.5 MB
Download

Sample Code (.zip)

9.8 KB

00:00 Welcome to Vector Databases and Embeddings With ChromaDB. I’m Joseph, and I’ll be your tour guide as we explore this fascinating, multidimensional topic.

00:08 But first, let’s establish some context. Actually, establishing context is a major theme in this course. As I record this in the year 2026, large language models, LLMs for short, have become ubiquitous.

00:21 People in every major sector of society interact with them, whether for work, personal productivity, or even entertainment. Many of the products and services we use have LLMs or LLM-based agents working in the background as well, in various parts of their workflows.

00:35 But ever since their inception, LLMs have had a major flaw. They don’t know everything, and what’s worse, they often don’t know what they don’t know. At their best, when LLMs have knowledge gaps, they’ll give you vague or useless answers.

00:50 At their worst, hallucinations. LLMs will make up facts and outright lie to you. But there is a solution. There’s something you can do as a programmer.

01:00 Provide contextual, relevant information to the LLM. You can even build a system to find documents related to a given query, giving the LLM a concrete knowledge base to work with when answering that query.

01:12 How? It’s in the title. By using embeddings and a vector database like ChromaDB. So in this course, you’ll learn how to explain what vectors are and why they’re important, apply vector operations using Python libraries,

01:27 represent unstructured data with word and text embeddings, store and query embeddings using a vector database, and finally, enhance LLM responses with context using ChromaDB.

01:40 As far as prerequisites for the material we’ll be covering, you should be comfortable with Python basics and some high school math. For Python, things like built-in Python data types and data structures, using Python operators,

01:54 defining and using functions,

01:56 classes, installing and using third-party libraries, and how to structure your Python projects. If you need to revisit any of these topics before getting started, the Python Basics learning path is a great resource.

02:07 There, you can find videos targeting any one of these specific topics. And as for the math part, if it’s been a while since you’ve cracked a math textbook, don’t sweat it. The math in this course only serves to build your intuition behind the techniques you’ll be working with.

02:21 You’ll see a few formulas, but they’ll actually boil down to operations you already know. Addition, multiplication, square roots, that kind of thing. And with that said, let’s get to it.

02:31 Next up, we talk about vectors.

Become a Member to join the conversation.