KV Cache

Key-value caching for faster autoregressive inference and lower memory bandwidth during decoding.