[pull] master from ggerganov:master #30

pull · 2024-02-05T16:09:27Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

* Fix cpy with dims of 3 * rm asserts --------- Co-authored-by: Abhilash Majumder <[email protected]>

* Update server-llm.sh Add flag --non-interactive that allows run script without asking a permission * Update scripts/server-llm.sh --------- Co-authored-by: Georgi Gerganov <[email protected]>

* added dynamic temp params in main * added help text

We get slightly better PPL, and we cut quantization time in nearly half. The trick is to 1st quantize without forcing points onto the E8-lattice. We can then use a narrower search range around the block scale that we got that way. Co-authored-by: Iwan Kawrakow <[email protected]>

* py : fix internlm2-hf convert to gguf * ggml-ci

Co-authored-by: Iwan Kawrakow <[email protected]>

* Avoid duplicating function calls when using MIN/MAX macros. Since these copy "a" and "b" they ask the compiler to evaluate one of them twice. The compiler doesn't have a problem with removing the duplication in something like MAX(0, x + 2), but in some cases we're calling functions, and those calls just happen twice. By explicitly evaluating at the expression we get smaller and faster code without duplicate calls. See ggml_rope_yarn_corr_dims in Compiler Explorer: https://godbolt.org/z/Ee4KMrvKh Code behaves exactly the same. * Update ggml.c --------- Co-authored-by: Georgi Gerganov <[email protected]>

* Make use of ggml-quants.h possible in C++ code * One cannot possibly be defining static_assert in a C++ compilation --------- Co-authored-by: Iwan Kawrakow <[email protected]>

* README: updated introduction * readme : update --------- Co-authored-by: Georgi Gerganov <[email protected]>

* make: Use ccache for faster compilation

AidanBeltonS and others added 12 commits February 5, 2024 12:38

[SYCL] Fix cpy with dims of 3 (#5289)

4833ac2

* Fix cpy with dims of 3 * rm asserts --------- Co-authored-by: Abhilash Majumder <[email protected]>

readme : add CodeShell models to the supported models list (#5330)

5d55b0c

scripts : add non-interactive server-llm.sh (#5303)

4be04c8

* Update server-llm.sh Add flag --non-interactive that allows run script without asking a permission * Update scripts/server-llm.sh --------- Co-authored-by: Georgi Gerganov <[email protected]>

scripts : fix typos, cleanup (#5303)

30679d4

common : add dynamic temperature parameters to main example cli (#5295)

e6f8177

* added dynamic temp params in main * added help text

server : allow to get default generation settings for completion (#5307)

a2d60c9

py : fix internlm2-hf convert to gguf (#5305)

7e1ae37

* py : fix internlm2-hf convert to gguf * ggml-ci

iq3_xxs: quards for the no-imatrix situation (#5334)

89503dc

Co-authored-by: Iwan Kawrakow <[email protected]>

ggml : make use of ggml-quants.h possible in C++ code (#5338)

c6b3955

* Make use of ggml-quants.h possible in C++ code * One cannot possibly be defining static_assert in a C++ compilation --------- Co-authored-by: Iwan Kawrakow <[email protected]>

README: updated introduction (#5343)

78b00dd

* README: updated introduction * readme : update --------- Co-authored-by: Georgi Gerganov <[email protected]>

pull bot added the ⤵️ pull label Feb 5, 2024

make: Use ccache for faster compilation (#5318)

098f6d7

* make: Use ccache for faster compilation

teleprint-me closed this Feb 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggerganov:master #30

[pull] master from ggerganov:master #30

pull bot commented Feb 5, 2024 •

edited

Loading

[pull] master from ggerganov:master #30

[pull] master from ggerganov:master #30

Conversation

pull bot commented Feb 5, 2024 • edited Loading

pull bot commented Feb 5, 2024 •

edited

Loading