token
A token is a minimal unit of text used by natural language processing (NLP) systems and language models (LLMs), typically produced by a tokenizer that segments text into words, subwords, characters, or bytes.
Tokens are mapped to integer IDs from a fixed vocabulary so that models can process sequences efficiently. Tokens are distinct from words. Practical limits, costs, and context windows for LLMs are measured in tokens.
By Leodanis Pozo Ramos • Updated Oct. 15, 2025