Embedding

A dense vector that represents a token or other discrete item so the model can run continuous math on it.

What It Is

An embedding is a fixed-size list of numbers (a vector) associated with a discrete symbol such as a token ID. In transformer language models, the input embedding layer converts each position's token ID into a vector of model width (hidden size).

Why It Matters

Embeddings are where text becomes geometry: similar tokens can end up with nearby vectors after training, and every later layer operates on those vectors. Context length, batching, and memory all count embedding rows when you size a deployment.

Simple Example

Suppose token ID 42 maps to a 768-dimensional vector. A sequence of three tokens becomes a 3×768 tensor of embeddings before the first attention block runs.

Common Confusions

An embedding is not the same as a one-hot vector: the table stores dense learned weights rather than a single 1 in a huge sparse vector. Embeddings are also not logits or probabilities—they are inputs, not outputs of the vocabulary head.

Embedding

What It Is

Why It Matters

Simple Example

Common Confusions

Tags

References

On this page

Embedding

What It Is

Why It Matters

Simple Example

Common Confusions

Related Concepts And Modules

Tags

References

On this page