A network stack that maps raw or patched inputs into internal representations—tokens, latents, or context vectors—for downstream decoders or heads.
What It Is
Encoders appear as transformer stacks over patch embeddings, convolutional towers before a latent bottleneck, or BERT-style bidirectional encoders that emit contextual vectors per token. The output is always an internal representation: not logits and not final generated pixels or text unless a head is attached directly.
Why It Matters
Separating encoders from decoders clarifies seq2seq translation, multimodal fusion, and diffusion pipelines where one network fills latent space and another reads it out. It also explains why representation, patch, and latent-space glossary entries sit upstream of architecture family pages.
An encoder is not the same as the full model: decoder-only language models still contain transformer blocks, but the name encoder usually marks the input-side stack in a split architecture. An encoder is also not a tokenizer—tokenization is discrete indexing; the encoder operates on embeddings or patches after that step.