Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[
Core generation
] Adds support for static KV cache #27931[
Core generation
] Adds support for static KV cache #27931Changes from 84 commits
17b8b38
80ef815
2639b5d
9f2e1e4
271260c
5be65ff
c6b6d35
90224dd
24ffbfb
cd95e98
7cd3655
eeebc66
5819a85
216dd8f
a48ae88
aeefa26
e05f8da
07f5cdc
f769b0e
bb6a160
dd1e42c
a3b0003
dacd0ff
021f674
98af852
8594670
60af293
05166fe
9c1a3b4
d5395af
a20a183
bce7653
0e59f70
e573000
fce7e46
24ef3cf
344309f
42e5a38
6637755
6ec92df
d784927
0332d3f
4e40703
770c5e6
7bd1fca
25fd440
4c3220f
d51acfa
2b2e0c2
4b93379
ab07e80
77ccdce
ad6832a
1cb6a16
d044263
c838352
e80b6a1
8308809
0132a2c
87b3064
4d88605
011931e
e838f57
c23815a
c985064
6a954d5
45760d6
64f5455
f103454
c7b5d2c
538ccf0
ce42624
33832d2
8a53f53
f560fe5
5f90ed4
e5c731e
b6c9180
8de700f
e92b1a0
d9f7f16
d98f277
65217de
a219236
7a6b57d
2822423
70df80e
b4fbf3f
70d5ded
ec22fb1
9968b0e
dc885ca
0c2a66f
e087adc
c0cf294
da720c8
8f4c49d
c22d564
89929b9
d4b24ee
d7e400e
9d9eec3
4eb8a9e
dad35d6
6f516a0
f25ac8e
17f0350
b91efbb
256c324
327b77a
8509e91
60aa86d
7de4ace
453df24
0a1f8d2
040b2f1
1763ec7
c4242c8
af097af
5bbde6f
7f8ca33
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have a dtype arg for the cache as well. For instance, if a user initializes the model and then casts it with
.to()
, the cache type will be misaligned (model dtype != config.torch_dtype) 😬There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the cache is a nn.Module it should be moved as well no? I'll try that