-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Binaries & Improved Sampling API #223
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@AsakusaRinne just checking to make sure I understand the new release process. Since this now has the |
@martindevans Yes, that's right. :) |
martindevans
force-pushed
the
batch_decoding
branch
from
October 28, 2023 20:32
54323d6
to
c786fb0
Compare
…he new batched decoding system.
…, instead of always overriding them by default!
martindevans
force-pushed
the
batch_decoding
branch
from
October 29, 2023 23:59
656c80a
to
529b06b
Compare
- Removed all `record struct` uses in native code - Removed usage of `readonly` in native structs Minor fix: - Added sequential layout to `LLamaModelQuantizeParams`
Should fix the extreme speed loss.
@martindevans, in regards to testing #225 - it seems the performance issue has been resolved, it works with the same speed as with v0.5.1. |
Thanks for confirming that @lexxsoft |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A lot of work has ended up bundled into this one PR!
My initial aim with the work in this PR was to build a batched decoding prototype. That's included in this PR (as example number 15).
Working with batched decoding required updates to the binaries again, since some of the batched decoding API has changed since we added support in #185.
Updating those binaries required changes to the sampling API because some sampling methods were changed in
llama.cpp
. I moved all the sampling methods over from being static methods inSamplingAPI
to being instance methods onLLamaTokenDataArray
. In the process I fixed a bug (properly copying thesorted
flag back from the C++ side), I think this would have been causing a small performance reduction with sampling.New Binaries from: this commit