Skip to content

Allows interweaving of arbitrary kinds of 'attention' layers, like sliding window, reuse prev layer kv cache etc. #6979

Allows interweaving of arbitrary kinds of 'attention' layers, like sliding window, reuse prev layer kv cache etc.

Allows interweaving of arbitrary kinds of 'attention' layers, like sliding window, reuse prev layer kv cache etc. #6979

Triggered via pull request June 23, 2024 17:20
@ShashankMosaicMLShashankMosaicML
synchronize #1299
Status Success
Total duration 9m 7s
Artifacts

pr-gpu.yaml

on: pull_request_target
Matrix: pytest-gpu
Fit to window
Zoom out
Zoom in

Annotations

2 warnings
gpu-2.3.1 / pytest-gpu
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3, actions/setup-python@v4, actions/cache@v3. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
gpu-2.3.0 / pytest-gpu
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3, actions/setup-python@v4, actions/cache@v3. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.