Generalization

How well a model performs on data it was not explicitly trained to memorize—validation sets, new domains, and live traffic are the practical tests.

What It Is

Generalization measures whether learned patterns transfer beyond the exact training examples—via validation splits, out-of-domain tests, or online A/B metrics. It depends on capacity, data diversity, optimization, and regularization; alignment and filtering change behavior but do not replace statistical generalization.

Why It Matters

Customers care about reliability on new prompts, users, and modalities, not training-set accuracy. Naming generalization separately from overfitting and capacity clarifies why bigger models can generalize better yet still fail on adversarial or out-of-scope inputs.

Simple Example

A language model trained on English helpdesk logs answers Spanish product questions reasonably well because it learned task structure, not ticket IDs—until domain shift is so large that even good generalization breaks down.

Common Confusions

Generalization is not guaranteed by scale alone—data and objectives matter. It is also not the same as instruction-following quality, which mixes capability with alignment. Perfect training accuracy with poor test accuracy indicates failed generalization, often via overfitting.

Generalization

What It Is

Why It Matters

Simple Example

Common Confusions

Tags

References

On this page

Generalization

What It Is

Why It Matters

Simple Example

Common Confusions

Related Concepts And Modules

Tags

References

On this page