feat: support attention sinks #1105

OlivierDehaene · 2023-10-05T12:41:02Z

See https://arxiv.org/abs/2309.17453

HuggingFaceDocBuilderDev · 2023-10-05T12:49:10Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

CEDIDataVault · 2023-10-24T02:17:34Z

Hello, may I ask if attention sinks are still rather unstable currently? How should I use it?

Bec-k · 2023-10-25T10:30:45Z

Some test is failing. Test should be changed or it's a bug?

OlivierDehaene · 2023-10-30T14:15:25Z

The implemetation is not correct yet as we do not cache before RoPE. I need to modify the paged attention kernels but I did not find the time yet to do it.

Bec-k · 2023-11-20T08:37:42Z

Any update on this one?

zfang · 2024-01-04T22:25:01Z

Looking forward to this as a better alternative than RoPE interpolation. In my experience RoPE interpolation techniques such as NTK tend to cause hallucination.

Bec-k · 2024-01-09T09:33:46Z

Strange that this is not yet implemented, because i haven't seen or heard a better attentions technique so far. Please prove me wrong.

Narsil · 2024-01-10T16:54:31Z

Reason is simple, this implementation doesn't work exactly like sinks, the reason is KV cache handling because attention sinks require to slide everything except the sinks themselves, which makes AFAIK the KV cache invalid (since as soon as you slide once, the attention of the first non sink token is wrong, since it had been attending to non sinks before).
Prove me wrong.

@Bec-k No one is saying you're wrong. That doesn't mean we can implement the feature easily (or that it's worth the engineering time or maintenance cost) atm.

xuan1905 · 2024-01-15T06:21:03Z

Attention Sinks caching has been implemented in transformers. Really looking forward to the release of this feature on huggingface.

Bec-k · 2024-02-21T13:46:00Z

@OlivierDehaene attention cache have been added into the huggingface transformers package, it is implemented as a custom cache:
https://huggingface.co/docs/transformers/en/internal/generation_utils#transformers.SinkCache
https://github.com/huggingface/transformers/blob/main/src/transformers/cache_utils.py#L172
PR of integration into transformers library:
https://github.com/huggingface/transformers/pull/26681/files

Can you refactor "text-generation-inference" code to adapt this new feature? Does this solves initial problem?

Bec-k · 2024-02-21T13:48:31Z

Possibly you can forward that new Sinks Cache implementation in transformers and request additional changes for it to be properly implemented into your runtime?

OlivierDehaene added 2 commits October 5, 2023 14:40

feat: support attention sinks

cc36128

update launcher doc

29341dc

missing arg

56de96a

Narsil mentioned this pull request Oct 6, 2023

Curious obeservation with T5 example and Apple Accelerate huggingface/candle#868

Closed

jqueguiner mentioned this pull request Oct 25, 2023

StreamingLLM - Attention sinks #1139

Closed

Bec-k mentioned this pull request Nov 28, 2023

attention_sinks oobabooga/text-generation-webui#4736

Closed

github-actions bot added the Stale label Mar 23, 2024

github-actions bot closed this Mar 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support attention sinks #1105

feat: support attention sinks #1105

OlivierDehaene commented Oct 5, 2023

HuggingFaceDocBuilderDev commented Oct 5, 2023

CEDIDataVault commented Oct 24, 2023

Bec-k commented Oct 25, 2023

OlivierDehaene commented Oct 30, 2023

Bec-k commented Nov 20, 2023

zfang commented Jan 4, 2024

Bec-k commented Jan 9, 2024

Narsil commented Jan 10, 2024

xuan1905 commented Jan 15, 2024

Bec-k commented Feb 21, 2024

Bec-k commented Feb 21, 2024

feat: support attention sinks #1105

feat: support attention sinks #1105

Conversation

OlivierDehaene commented Oct 5, 2023

HuggingFaceDocBuilderDev commented Oct 5, 2023

CEDIDataVault commented Oct 24, 2023

Bec-k commented Oct 25, 2023

OlivierDehaene commented Oct 30, 2023

Bec-k commented Nov 20, 2023

zfang commented Jan 4, 2024

Bec-k commented Jan 9, 2024

Narsil commented Jan 10, 2024

xuan1905 commented Jan 15, 2024

Bec-k commented Feb 21, 2024

Bec-k commented Feb 21, 2024