-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable StaticCache for assisted generation #34797
Open
yao-matrix
wants to merge
51
commits into
huggingface:main
Choose a base branch
from
yao-matrix:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+139
−19
Open
Changes from 22 commits
Commits
Show all changes
51 commits
Select commit
Hold shift + click to select a range
c205b2e
enable StaticCache for assisted generation
yao-matrix 30021dd
update
yao-matrix 620c861
remove warnings import
yao-matrix b5283e9
enable StaticCache for assisted generation
yao-matrix 71b7d22
update
yao-matrix c967bbe
remove warnings import
yao-matrix 980aa08
done
yao-matrix c79411d
done
yao-matrix c717652
fix review comments
yao-matrix 67618e5
Merge branch 'main' of https://github.com/yao-matrix/transformers
yao-matrix c8e2428
Merge branch 'main' into main
yao-matrix fde7ebd
Merge branch 'main' into main
yao-matrix e1169a3
Merge branch 'main' into main
yao-matrix 8a9a753
add static cache ci
b74a7fe
Merge branch 'main' of https://github.com/yao-matrix/transformers
e67a3fd
ship Gemma2 StaticCache CI since it uses HybridCache
177634c
ruff format
45d0410
fix phimoe ci
yao-matrix a33f660
fix mixtral ci
yao-matrix ff07e47
fix ci
yao-matrix fbef806
cont.
yao-matrix 3564a87
ci
yao-matrix b9cf597
fix ci
yao-matrix 803166d
ci
yao-matrix df1594c
ci
yao-matrix 87b7f15
ci
yao-matrix e60b1fe
Merge branch 'main' into main
yao-matrix 5e195c2
ci
yao-matrix 759da36
ci
yao-matrix 093b647
ci
yao-matrix 3775dc2
ci
yao-matrix 817d303
ci
yao-matrix 6e2ad2a
add # Ignore copy
yao-matrix 99b6bc2
using a smarter way, ignore in test_utils
yao-matrix af33391
ci
yao-matrix 587b55f
skip Gemma2, it declars support static cache, but it's hybrid cache a…
yao-matrix 0a49d6f
refine error message
yao-matrix 1deeb55
Merge branch 'main' into main
yao-matrix 62b70e4
Merge branch 'main' into main
yao-matrix 210c2e0
Merge branch 'main' into main
yao-matrix 7b97aa4
add test case test_assisted_decoding_compile
yao-matrix 9cb45da
Merge branch 'main' into main
yao-matrix 93cd7bf
fix bug
yao-matrix 3cc23d7
Merge branch 'main' into main
yao-matrix b45336c
Merge branch 'main' into main
yao-matrix 04f2ea1
Merge branch 'main' into main
yao-matrix b08d1fc
Merge branch 'main' into main
yao-matrix 0904268
Merge branch 'main' into main
yao-matrix dd148a8
Merge branch 'main' into main
yao-matrix 3171476
cohere2 is HybridCache
4e064ec
Merge branch 'main' into main
yao-matrix File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's not skip entirely, but only the
static_cache
test, as we still need to check if assisted generation works in Gemma2 :)Maybe it will be skipped by the
model._support_static_cache
as I've commented above, but if not we can skip only thetest_assisted_decoding_with_num_logits_to_keep_1_static
(maybe it's called a bit differently)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i switch to
_supports_static_cache
to skip the case. For Gemma, it's a bit different, since it's using HybridCache and claims_supports_static_cache = True
, I still skip it in model test file. Will remove this skip after enable HybridCache for assisted decoding, I plan to enable it after this PR(pure StaticCache) merged, thx.