New Binaries & Improved Sampling API #223

martindevans · 2023-10-26T01:16:14Z

A lot of work has ended up bundled into this one PR!

My initial aim with the work in this PR was to build a batched decoding prototype. That's included in this PR (as example number 15).

Working with batched decoding required updates to the binaries again, since some of the batched decoding API has changed since we added support in #185.

Updating those binaries required changes to the sampling API because some sampling methods were changed in llama.cpp. I moved all the sampling methods over from being static methods in SamplingAPI to being instance methods on LLamaTokenDataArray. In the process I fixed a bug (properly copying the sorted flag back from the C++ side), I think this would have been causing a small performance reduction with sampling.

New Binaries from: this commit

martindevans · 2023-10-26T13:32:28Z

@AsakusaRinne just checking to make sure I understand the new release process. Since this now has the patch-release tag that means it will automatically push out a new 0.6.1 package when merged?

AsakusaRinne · 2023-10-26T13:36:15Z

@martindevans Yes, that's right. :)

had to update binary to `b1426`

…helper methods

…y` object

…mmit/6961c4bd0b5176e10ab03b35394f1e9eab761792`

…he new batched decoding system.

…, instead of always overriding them by default!

- Removed all `record struct` uses in native code - Removed usage of `readonly` in native structs Minor fix: - Added sequential layout to `LLamaModelQuantizeParams`

Should fix the extreme speed loss.

lexxsoft · 2023-10-31T06:47:36Z

@martindevans, in regards to testing #225 - it seems the performance issue has been resolved, it works with the same speed as with v0.5.1.

martindevans · 2023-10-31T13:23:00Z

Thanks for confirming that @lexxsoft

AsakusaRinne added patch-release enhancement New feature or request labels Oct 26, 2023

martindevans mentioned this pull request Oct 28, 2023

v0.6.0 significantly reduced performance #225

Closed

martindevans added 9 commits October 28, 2023 21:32

initial setup

8cd8125

It works!

a024d22

had to update binary to `b1426`

binaries (avx512)

b38e3f6

Reduced some uses of NativeApi in BatchedDecoding by adding some …

3c5547b

…helper methods

Rewritten sampling API to be accessed through the `LLamaTokenDataArra…

e81b302

…y` object

Removed hardcoded model path

f41d26f

Added binaries, built from https://github.com/ggerganov/llama.cpp/co…

c7fdb97

…mmit/6961c4bd0b5176e10ab03b35394f1e9eab761792`

Minor cleanup on the BatchedDecoding example

aae63a5

Using IReadOnlyList instead of IEnumerable in IInferenceParams

c786fb0

martindevans force-pushed the batch_decoding branch from 54323d6 to c786fb0 Compare October 28, 2023 20:32

martindevans added 7 commits October 28, 2023 21:50

Cleaned up stateless executor as preparation for changing it to use t…

ccb8afa

…he new batched decoding system.

Moved helper methods into LLamaBatchSafeHandle

7e3cde4

Added timing to stateless test

cdf20d3

Added a safe method for llama_get_logits_ith

51c292e

Skipped slow test again

09bc688

Fixed Eval on platforms < dotnet 5

dcc82e5

- Fixed rope frequency/base to use the values in the model by default…

529b06b

…, instead of always overriding them by default!

martindevans force-pushed the batch_decoding branch from 656c80a to 529b06b Compare October 29, 2023 23:59

martindevans added 2 commits October 30, 2023 21:35

Debugging slowdown by removing some things:

b6d2421

- Removed all `record struct` uses in native code - Removed usage of `readonly` in native structs Minor fix: - Added sequential layout to `LLamaModelQuantizeParams`

New binaries from this commit: ggerganov/llama.cpp@207b519

db8f398

Should fix the extreme speed loss.

martindevans added minor-release and removed patch-release labels Oct 31, 2023

martindevans merged commit 5a9e13c into SciSharp:master Oct 31, 2023
4 checks passed

martindevans deleted the batch_decoding branch October 31, 2023 13:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Binaries & Improved Sampling API #223

New Binaries & Improved Sampling API #223

martindevans commented Oct 26, 2023 •

edited

Loading

martindevans commented Oct 26, 2023

AsakusaRinne commented Oct 26, 2023

lexxsoft commented Oct 31, 2023

martindevans commented Oct 31, 2023

New Binaries & Improved Sampling API #223

New Binaries & Improved Sampling API #223

Conversation

martindevans commented Oct 26, 2023 • edited Loading

martindevans commented Oct 26, 2023

AsakusaRinne commented Oct 26, 2023

lexxsoft commented Oct 31, 2023

martindevans commented Oct 31, 2023

martindevans commented Oct 26, 2023 •

edited

Loading