In deep learning, a tensor generalizes vectors and matrices to any number of dimensions. Frameworks such as PyTorch and JAX store model state and activations as tensors and run operations (matmul, softmax, layer norm) that preserve or reshape those dimensions.
Why It Matters
Shape errors are among the most common debugging surprises: a logits tensor might be [batch, vocab] while hidden states are [batch, seq, hidden]. Naming tensors by role (embeddings, keys, values) keeps papers, code, and this reference aligned.
Simple Example
A minibatch of 4 sequences, each 128 tokens wide, with hidden size 768 is often written as a tensor of shape 4×128×768. The language-model head may reduce the last dimension to vocabulary size, yielding 4×128×50257 logits before softmax.
Common Confusions
A tensor in ML is a numeric array, not the same as a tensor in general relativity. A single vector is a 1-D tensor; a matrix is 2-D. Scalars are 0-D tensors. Do not confuse tensor with parameter—parameters are tensors that get updated during training.