Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Binaries & Improved Sampling API #223

Merged
merged 18 commits into from
Oct 31, 2023

Conversation

martindevans
Copy link
Member

@martindevans martindevans commented Oct 26, 2023

A lot of work has ended up bundled into this one PR!

My initial aim with the work in this PR was to build a batched decoding prototype. That's included in this PR (as example number 15).

Working with batched decoding required updates to the binaries again, since some of the batched decoding API has changed since we added support in #185.

Updating those binaries required changes to the sampling API because some sampling methods were changed in llama.cpp. I moved all the sampling methods over from being static methods in SamplingAPI to being instance methods on LLamaTokenDataArray. In the process I fixed a bug (properly copying the sorted flag back from the C++ side), I think this would have been causing a small performance reduction with sampling.

New Binaries from: this commit

@AsakusaRinne AsakusaRinne added patch-release enhancement New feature or request labels Oct 26, 2023
@martindevans
Copy link
Member Author

@AsakusaRinne just checking to make sure I understand the new release process. Since this now has the patch-release tag that means it will automatically push out a new 0.6.1 package when merged?

@AsakusaRinne
Copy link
Collaborator

@martindevans Yes, that's right. :)

 - Removed all `record struct` uses in native code
 - Removed usage of `readonly` in native structs

Minor fix:
 - Added sequential layout to `LLamaModelQuantizeParams`
@lexxsoft
Copy link

@martindevans, in regards to testing #225 - it seems the performance issue has been resolved, it works with the same speed as with v0.5.1.

@martindevans
Copy link
Member Author

Thanks for confirming that @lexxsoft

@martindevans martindevans merged commit 5a9e13c into SciSharp:master Oct 31, 2023
4 checks passed
@martindevans martindevans deleted the batch_decoding branch October 31, 2023 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request minor-release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants