Glossary
Logit
A raw, unnormalized score for each vocabulary item before softmax turns scores into probabilities.
What It Is
A logit is a real-valued score assigned to a single vocabulary choice at a position, typically produced by a weight matrix applied to hidden states. The full vector of logits has length equal to vocabulary size; no entry is guaranteed to lie between 0 and 1 until softmax runs.
Why It Matters
Sampling, loss functions, and interpretability all refer to logits or to probabilities derived from them. Temperature scaling divides logits before softmax; ranking tokens by logit alone often matches ranking by probability when softmax is monotonic.
Simple Example
Suppose the model emits three logits for candidate next tokens. Softmax will map those three numbers to values that sum to 1; the largest logit usually becomes the largest probability, but the scale of logits only matters relative to each other.zi∈R(logit for vocabulary index i)
Common Confusions
Logits are not probabilities: they can be negative, greater than one, and need not sum to 1. Logits are also not the same as log-probabilities unless you explicitly take a log of normalized probabilities. The plural logits refers to the whole score vector.