Glossary

Encoder

A network stack that maps raw or patched inputs into internal representations—tokens, latents, or context vectors—for downstream decoders or heads.

What It Is

Encoders appear as transformer stacks over patch embeddings, convolutional towers before a latent bottleneck, or BERT-style bidirectional encoders that emit contextual vectors per token. The output is always an internal representation: not logits and not final generated pixels or text unless a head is attached directly.

Why It Matters

Separating encoders from decoders clarifies seq2seq translation, multimodal fusion, and diffusion pipelines where one network fills latent space and another reads it out. It also explains why representation, patch, and latent-space glossary entries sit upstream of architecture family pages.

Simple Example

A ViT encoder turns 196 image patch embeddings into 196 contextual vectors; a translation encoder turns source-language tokens into memory vectors that a decoder attends to when generating the target sentence.

Common Confusions

An encoder is not the same as the full model: decoder-only language models still contain transformer blocks, but the name encoder usually marks the input-side stack in a split architecture. An encoder is also not a tokenizer—tokenization is discrete indexing; the encoder operates on embeddings or patches after that step.

Tags

References