Glossary
Backpropagation
The backward pass that applies the chain rule on a computational graph to compute gradients for every differentiable parameter.
What It Is
Backpropagation (often shortened to backprop) is reverse-mode automatic differentiation specialized for neural networks. Starting at the loss node, the framework multiplies local Jacobians along edges of the computational graph so earlier parameters receive credit for how they influenced the final error.Why It Matters
It makes deep networks trainable without hand-derived derivatives for every layer. The same code that computed logits, softmax, and cross-entropy loss can call backward once and populate gradients on embeddings, attention projections, and output weights consistently.Simple Example
Consider logits → softmax → cross-entropy. Backpropagation first yields ∂L/∂logits, then continues through earlier matmuls and activations. Each step applies the chain rule so ∂L/∂W for a weight matrix W combines upstream loss sensitivity with the forward input that multiplied W.loss = cross_entropy(softmax(logits), targets)
loss.backward() # fills parameter .grad via the graph
optimizer.step()