Skip to content

Allows interweaving of arbitrary kinds of 'attention' layers, like sliding window, reuse prev layer kv cache etc.#1299

Merged
ShashankMosaicML merged 86 commits intomosaicml:mainfrom ShashankMosaicML:mixed_attention_modulesJun 30, 2024

Commits

Commits on Jun 28, 2024