-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SnapKV_Cache support added #34710
base: main
Are you sure you want to change the base?
SnapKV_Cache support added #34710
Conversation
Pinging the cache gang @zucchini-nlp @ArthurZucker @gante |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! super interested in having new kv cache implementations, but we need to make them compatible with the current API! Tha kv length is contained in the cache_positions
which is what should be used! No modeling changes should be required!
@ArthurZucker Thanks alot for your response! So do you think if I remove the modeling changes then the |
Yes !
Overall we should not need modeling changes! |
But with #35235 you can maybe find a better way to support this by creating a new |
What does this PR do?
This PR adds the implementation of SnapKV Cache paper to cache_utils.py file. Also adds the changes in flash_attention2 in llama_modeling to reflect the initialization of the SnapKV cache.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.