Glossary

Token

The smallest unit of text a language model reads and predicts—usually a word piece, not always a whole word. Each token ID maps to a dense vector through vector embedding of model hidden size before attention runs.

What It Is

A token is one step in the model's discrete text vocabulary. Subword tokenizers often split rare words into pieces so the vocabulary stays finite while still covering open vocabulary text.

Why It Matters

Tokenization sets context length in tokens, shapes cost per request, and determines whether two strings look identical to the model. Mismatched tokenizers between training and serving are a common integration bug.

Simple Example

The phrase "language models" might become two or three tokens depending on the tokenizer. Each ID is looked up in an embedding table before attention layers run.
How raw text becomes token IDs before the transformer stack

Common Confusions

Tokens are not always words; byte-level BPE can emit single-character tokens. Token count also differs from word count and from Unicode grapheme count.

Tags

References