embedding
An embedding is a learned vector representation that maps discrete items, such as words, sentences, documents, images or users, into a continuous vector space where, ideally, semantic or structural similarity corresponds to geometric proximity.
Trained models generates embedings to place related items near each other, using objectives, such as context-prediction, contrastive learning, supervised labels, or multimodal alignment. They can be:
- Static: For example, classic word vectors that assign each term a fixed vector
- Contextual: As in transformer-based encoders, where the representation depends on surrounding context
Typical applications include semantic search and retrieval, retrieval-augmented generation (RAG), clustering, classification, recommendation, deduplication and anomaly detection.
In practice, one compares embedding vectors using metrics like cosine similarity or dot product, stores them in vector (nearest-neighbour) indices or databases, and addresses quality issues. These quality issues can include domain shift, bias, hubness, and anisotropy, which can be addressed through methods like normalisation, domain adaptation, and thorough evaluation.
By Leodanis Pozo Ramos • Updated Oct. 31, 2025