Modules are the mid-level parts of a network: multi-head attention, grouped-query attention, convolution blocks, or cross-attention layers. An architecture describes how modules connect; module pages explain one mechanism in depth.
Why It Matters
Searching by module helps you compare attention variants across models and spot when a release swaps one module for another without changing the overall architecture family.
Simple Example
A transformer block is often drawn as two modules in sequence: self-attention, then a position-wise feed-forward network. Each can be swapped—for example, GQA instead of MHA—while keeping the block shape.
Common Confusions
A module is larger than a component (heads, projections) but smaller than a full architecture. It is also not a trained model: modules are design patterns; checkpoints combine many module weights.