embedding

An embedding is a learned vector representation that maps discrete items, such as words, sentences, documents, images, or users, into a continuous vector space where, ideally, semantic or structural similarity corresponds to geometric proximity.

Trained models generate embeddings to place related items near each other, using objectives such as context-prediction, contrastive learning, supervised labels, or multimodal alignment. They can be:

Static: For example, classic word vectors that assign each term a fixed vector
Contextual: As in transformer-based encoders, where the representation depends on the surrounding context

Typical applications include semantic search and retrieval, retrieval-augmented generation (RAG), clustering, classification, recommendation, deduplication, and anomaly detection.

In practice, one compares embedding vectors using metrics like cosine similarity or dot product, stores them in vector (nearest-neighbor) indices or databases, and addresses quality issues. These issues may include domain shift, bias, hubness, and anisotropy, which can be addressed through methods like normalization, domain adaptation, and thorough evaluation.

Embeddings and Vector Databases With ChromaDB

Course

Vector Databases and Embeddings With ChromaDB

Learn how to use ChromaDB, an open-source vector database, to store embeddings and give context to large language models in Python.

advanced ai databases data-science machine-learning

By Leodanis Pozo Ramos • Updated June 29, 2026

AI Coding Glossary Share Feedback

embedding

Related Resources

Vector Databases and Embeddings With ChromaDB