Glossary

Emergent Behavior

Qualitative capability jumps that appear only past certain model scale or training thresholds—distinct from smooth metric improvements tracked by scaling laws.

What It Is

Emergent behavior (often called emergent abilities in papers) describes qualitative jumps: a benchmark accuracy curve is flat, then rises quickly once model size or training crosses a threshold. Researchers debate measurement definitions—some jumps shrink when metrics are rescaled—but the term still marks behaviors that were not predictable from small-model trends alone.

Why It Matters

Roadmaps and safety reviews care about thresholds where models gain new skills. Scaling laws explain smooth loss trends; emergent behavior highlights tasks where capability seems discontinuous. Generalization and alignment still determine whether those skills help users safely in production.

Simple Example

A multiple-choice reasoning task may sit near chance for 8B models, then climb sharply at 70B while perplexity on web text improved smoothly across sizes—suggesting some skills are not visible from loss alone.

Common Confusions

Not every benchmark improvement is emergent—many tasks scale gradually. Emergence is not magic; it can reflect metric nonlinearities or richer pretraining mixtures. It also does not replace alignment: a newly emergent skill can still be misused without policy controls.

Tags

References

  1. Wei, Jason, et al. "Emergent Abilities of Large Language Models." arXiv, 2022, https://arxiv.org/abs/2206.07682.