A positive scaling factor applied to logits before softmax to make the next-token distribution sharper or flatter.
What It Is
Temperature is a positive scalar used at inference (and sometimes in analysis) as softmax(z / T). It does not change model weights—it only rescales logits immediately before normalization.
Why It Matters
Creative writing APIs often raise temperature for diverse outputs; factual or code assistants lower it for deterministic answers. Because temperature acts on logits before softmax, it directly controls how sharply probabilities concentrate on top tokens.
Simple Example
With logits [2, 1, 0], temperature 0.5 divides by a smaller T, exaggerating differences so softmax favors index 0 even more. Temperature 2.0 dampens gaps so the three probabilities move closer together.
Common Confusions
Temperature is not the same as top-p or top-k filtering—those truncate or renormalize after probabilities exist. Setting temperature to zero is handled as a limit toward argmax in many libraries, not as literal division by zero.