Home Architecture Glossary Tags

Getting Started

KV Cache

Key-value caching for faster autoregressive inference and lower memory bandwidth during decoding.

Inference

Open search entry page

Module

Grouped-Query Attention
An attention variant that reduces KV cache memory by sharing key-value heads across query groups.
Multi-Head Latent Attention
An attention variant that compresses key-value cache storage into a low-rank latent space while keeping distinct query heads.
Multi-Query Attention
An attention variant that shares one key-value head across all query heads to minimize KV-cache memory.