[pull] master from ggerganov:master #18

pull · 2024-01-13T21:36:42Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

* convert : update phi-2 to latest HF repo ggml-ci * py : try to fix flake stuff

* * fix deadlock * * dont ruint all whitespace

* metal : detect more GPU families * metal : refactor kernel loading * metal : set kernel family requirements * metal : fix kernel init + fix compile options * metal : take into account simdgroup reduction support * metal : print only skipped kernels * metal : fix check for simdgroup reduction support * metal : check for Metal 3 * metal : free allocations * metal : normalize encoder:setComputePipelineStatus calls ggml-ci * metal : fix Metal3 family check ggml-ci * metal : check for simdgroup matrix mul. feature ggml-ci

Co-authored-by: Bernhard Gstrein <[email protected]>

* add the parameter : --no-display-prompt , combine with --log-disable it will display only the generated tokens * remove empty line --------- Co-authored-by: Georgi Gerganov <[email protected]>

The fix should be just the `sudo apt-get update`

* examples : save-load-state: save only required state * llama : only reserve n_vocab * n_batch at most for logits llama_decode asserts that only n_batch tokens are passed each call, and n_ctx is expected to be bigger than n_batch. * llama : always reserve n_vocab * n_batch for logits llama_context de-serialization breaks if the contexts have differing capacity for logits and llama_decode will at maximum resize to n_vocab * n_batch. * llama : only save and restore used logits for batch sizes of 512 this reduces save state in the best case by around 62 MB, which can be a lot if planning to save on each message to allow regenerating messages. * llama : use ostringstream and istringstream for save and load * llama : serialize rng into minimum amount of space required * llama : break session version due to serialization changes

Co-authored-by: goerch <[email protected]>

ggml-ci

ggerganov and others added 15 commits January 13, 2024 13:44

convert : update phi-2 to latest HF repo (#4903)

15ebe59

* convert : update phi-2 to latest HF repo ggml-ci * py : try to fix flake stuff

server : fix crash with multimodal models without BOS token (#4904)

ee8243a

server : fix deadlock that occurs in multi-prompt scenarios (#4905)

356327f

* * fix deadlock * * dont ruint all whitespace

compare-llama-bench: tweak output format (#4910)

7dc7876

gguf : fix potential infinite for-loop (#4600)

c30b1ef

Co-authored-by: Bernhard Gstrein <[email protected]>

main : add parameter --no-display-prompt (#4541)

722d33f

* add the parameter : --no-display-prompt , combine with --log-disable it will display only the generated tokens * remove empty line --------- Co-authored-by: Georgi Gerganov <[email protected]>

workflows: unbreak nix-build-aarch64, and split it out (#4915)

6b48ed0

The fix should be just the `sudo apt-get update`

metal : disable log for loaded kernels (#4794)

2d57de5

llama : fix detokenization of non-special added-tokens (#4916)

f172de0

Co-authored-by: goerch <[email protected]>

server : fix prompt caching with system prompt (#4914)

0ea069b

metal : remove old API (#4919)

4be5ef5

ggml-ci

ggml: cache sin/cos for RoPE (#4908)

c71d608

sync : ggml

76484fb

pull bot added the ⤵️ pull label Jan 14, 2024

teleprint-me closed this Jan 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggerganov:master #18

[pull] master from ggerganov:master #18

pull bot commented Jan 13, 2024 •

edited

Loading

[pull] master from ggerganov:master #18

[pull] master from ggerganov:master #18

Conversation

pull bot commented Jan 13, 2024 • edited Loading

pull bot commented Jan 13, 2024 •

edited

Loading