Glossary

Conditioning

Extra inputs—text prompts, class labels, images, or guidance weights—that steer a generative model without retraining the full parameter stack each time.

What It Is

Conditioning signals are encoded—via text encoders, learned embeddings, or pooled features—and fused into the generative network at attention layers, FiLM layers, or as prepended tokens. Classifier-free guidance mixes conditional and unconditional forward passes in denoising models. Autoregressive models condition on prior tokens plus optional system or tool context in the prefix.

Why It Matters

Conditioning is the bridge between the two major generation paradigms: both autoregressive and denoising loops accept control inputs, but they attach them at different points in the stack. Understanding conditioning clarifies prompt engineering, adapter tuning, and why the same text encoder can steer image diffusion and multimodal chat.

Simple Example

A diffusion model receives a CLIP text embedding alongside each denoising step so "a red bicycle" shifts the score toward bicycle-shaped latents. A chat model prepends a system message token sequence so the autoregressive decoder sees policy instructions before the user turn.

Common Confusions

Conditioning is not fine-tuning: weights stay fixed while runtime inputs change behavior. It is also not the same as a hard constraint or decoder constraint algorithm—conditioning is learned soft steering. Prompt tokens in autoregressive models are conditioning, but not every context window entry is an independent control channel; some are simply prior generated tokens.

Tags

References