Glossary

Parameter

A learnable numeric value stored in the model weights that training updates to fit data.

What It Is

A parameter is any scalar or tensor element whose value is learned from data rather than fixed by hand. In transformers, most parameters live in linear projection weights, embedding tables, and layer-normalization scales.

Why It Matters

Parameter count drives memory footprint and training cost. When you read about fine-tuning or quantization, you are usually talking about how those stored weights are updated or compressed—not about activations, which are recomputed each forward pass.

Simple Example

A single attention projection might store a weight matrix W with shape (hidden, hidden). Every entry in W is a parameter; a gradient later tells each entry whether to increase or decrease.

Common Confusions

Parameters are not activations: parameters persist in the checkpoint, while activations are temporary values flowing through the network for one input. Parameters are also not logits or softmax outputs—those are computed from parameters plus the current input.

Tags

References