Glossary

Denoising Generation

A generation paradigm that starts from noisy latents and iteratively removes noise—or adds structure—until a clean sample emerges.

What It Is

Training corrupts data with a noise schedule; at inference the model walks backward along that schedule (or integrates a flow) from pure noise to structured output. U-Nets, DiT blocks, and score networks act as denoisers operating on latents or pixels. Unlike autoregressive generation, each step updates the full spatial or channel tensor rather than appending one discrete token.

Why It Matters

Denoising generation is the second major output family beside autoregressive decoding and explains diffusion models, score-based samplers, and many image or audio generators. It ties directly to latent, latent space, and generative-model taxonomy pages and contrasts with token-chain decoding on decoders.

Simple Example

A text-to-image diffusion model starts from Gaussian noise in latent space, runs fifty denoising steps conditioned on a text embedding, and decodes the final latent through a VAE to pixels.

Common Confusions

Denoising generation is not the same as a denoising autoencoder used only for representation learning—generation requires a sampling loop that produces new outputs. It is also not autoregressive token decoding even when implemented with transformer blocks; the state being updated is usually a full latent grid. A single forward pass that reconstructs clean data from noise without iteration is denoising, but not necessarily the full generative paradigm unless used inside a multi-step sampler.

Tags

References