Decoder

A network stack that turns internal representations into outputs—tokens, pixels, or structured predictions—often one step at a time with attention to prior context.

What It Is

Decoders include causal transformer stacks for language modeling, transposed convolutions or U-Net upsampling paths in image models, and cross-attention layers that query encoder memory in seq2seq systems. They consume representations—they do not patchify raw inputs unless the architecture is decoder-only end to end.

Why It Matters

Naming decoders separately from encoders shows where generation happens, how KV caches apply during inference, and why encoder–decoder pairs differ from decoder-only GPT-style models. It also sets up autoregressive generation as the paradigm most decoders follow in language settings.

Simple Example

A machine-translation decoder attends to encoder memory while emitting target tokens left to right. A VAE decoder maps a latent vector back to a 64×64 image. A GPT-style stack is decoder-only: the same blocks both represent context and predict the next token.

Common Confusions

A decoder is not always autoregressive—some decoders reconstruct in one shot—but language-model decoders almost always are. A decoder is also not the softmax head alone: the head sits on top of the decoder stack. Denoising generation is a separate paradigm that may reuse decoder-like U-Nets without autoregressive token steps.

Decoder

What It Is

Why It Matters

Simple Example

Common Confusions

Tags

References

On this page

Decoder

What It Is

Why It Matters

Simple Example

Common Confusions

Related Concepts And Modules

Tags

References

On this page