merge from upstream #14

l3utterfly · 2024-04-29T01:29:44Z

No description provided.

…v#6688) * Implement '--keep-split' to quantize model into several shards * Add test script * Update examples/quantize/quantize.cpp Co-authored-by: Georgi Gerganov <[email protected]> * Split model correctly even if tensor id is out-of-order * Update llama_model_quantize_params * Fix preci failures --------- Co-authored-by: z5269887 <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>

* tests : minor bash stuff ggml-ci * llama : fix build ggml-ci * tests : fix CUR_DIR -> ROOT_DIR ggml-ci * tests : fix fname ggml-ci

ggml-ci

This commit renamesthe lerp (linear interpolation) function in clip.cpp to avoid a conflict with the lerp function in the <cmath> standard C++ library when using c++20. The motivation for this change is to enable projects that use c++20 to be able to compile clip.cpp without having to resort to patching it. The lerp function was added to cmath in version C++20 (202002L) and is why this is not causing any issue at the moment as C++11/C++17 is currently used by llama.cpp. I realize that llama.cpp uses either C++11 (or C++17 in the case for SYCL) but wanted to ask if this would be an acceptable change just the same. Refs: https://en.cppreference.com/w/cpp/numeric/lerp Signed-off-by: Daniel Bevenius <[email protected]>

…v#6885) * llama : check that all the tensor data is in the model file * also check for unsigned overflow

* Update README.md * missing space * llama3 !

* add support for moondream vision language model This required making the following changes to the CLIP model: 1. Support for patch embedding bias. 2. Make class embedding and pre-layernorm optional. 3. Add support for post-layernorm. * Update examples/llava/clip.cpp --------- Co-authored-by: Georgi Gerganov <[email protected]>

* always use calloc clamp n_kv on failure to read a kv * ggml : alternative ctx->header.n_kv update --------- Co-authored-by: slaren <[email protected]>

…gerganov#6638) * server: cap n_predict if not set to n_ctx_train * server: fix infinite loop * server: infinite loop, move in process_token server: infinite loop: set stop limit to true * minor: spaces * minor: spaces * server: include prompt tokens in the EOS limit

* add basic tensor data validation function * add --check-tensors command line argument tensor validation is disabled by default and can be enabled by adding `--check-tensors` to the command line arguments. quantize always validates tensors.

* imatrix: save the dataset file used in the output file * llama: support kv overrides type string string * common: factorize KV Overrides parsing between common and server * quantize: add imatrix n entries and dataset KV metadata quantize: factorize KV Overrides parsing between common ggerganov#6656 * llama: remove kv override str_value initialization as it does not compile on some toolchain * quantize: add imatrix m_last_call as `quantize.imatrix.chunks_count` * quantize: add imatrix filename in KV * llama: add llama_model_kv_override_free * common: add llama_model_kv_override_free common: free kv override if used after model loading * llama: finally move the string KV override value to the stack * llama : minor * no need to add a NUL to the std::vector, std::string can be initialized from a pair of iterators. Co-authored-by: slaren <[email protected]> * kv override: ensure string termination --------- Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: slaren <[email protected]>

…n device (ggerganov#6933) * Reset schedule earlier to allow overlap with graph computation on device

…n_predict (ggerganov#6935) * ci: server: fix python env * ci: server: fix server tests after ggerganov#6638 * ci: server: fix windows is not building PR branch

…ective (ggerganov#6949)

Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/5c24cf2f0a12ad855f444c30b2421d044120c66f?narHash=sha256-XtTSSIB2DA6tOv%2Bl0FhvfDMiyCmhoRbNB%2B0SeInZkbk%3D' (2024-04-19) → 'github:NixOS/nixpkgs/7bb2ccd8cdc44c91edba16c48d2c8f331fb3d856?narHash=sha256-Drmja/f5MRHZCskS6mvzFqxEaZMeciScCTFxWVLqWEY%3D' (2024-04-25)

Co-authored-by: arthw <>

* not allow adding duplicated tensor name * no duplicated tensor while reading gguf * typo * throw exception inside llama_model_loader Co-authored-by: slaren <[email protected]> --------- Co-authored-by: slaren <[email protected]>

* Fix more int overflow during quant. * Fix some more int overflow in softmax. * Revert back to int64_t.

zj040045 and others added 30 commits April 25, 2024 13:29

tests : minor bash stuff (ggerganov#6902)

aa750c1

* tests : minor bash stuff ggml-ci * llama : fix build ggml-ci * tests : fix CUR_DIR -> ROOT_DIR ggml-ci * tests : fix fname ggml-ci

ggml : fix MIN / MAX macros (ggerganov#6904)

5477041

ggml-ci

ggml : fix redefinition of vaddvq_f32 for 32-bit ARM (ggerganov#6906)

5154372

llama : check that all the tensor data is in the model file (ggergano…

0ead1f1

…v#6885) * llama : check that all the tensor data is in the model file * also check for unsigned overflow

readme : update model list (ggerganov#6908)

3fe0596

* Update README.md * missing space * llama3 !

ci : tmp disable slow tests

853d06f

llama : synchronize before get/set session data (ggerganov#6911)

d6e1d44

cmake : remove obsolete ANDROID check

fa0b4ad

cmake : restore LLAMA_LLAMAFILE_DEFAULT

dba497e

bench: server add stop word for PHI-2 (ggerganov#6916)

5790c8d

ci: fix concurrency for pull_request_target (ggerganov#6917)

7d641c2

ci: server: fix python installation (ggerganov#6918)

d4a9afc

Merge pull request from GHSA-p5mv-gjc5-mwqv

83b72cb

* always use calloc clamp n_kv on failure to read a kv * ggml : alternative ctx->header.n_kv update --------- Co-authored-by: slaren <[email protected]>

ci: server: fix python installation (ggerganov#6922)

9e4e077

ci: server: fix python installation (ggerganov#6925)

bbe3c6e

llamafile : use 64-bit integers in sgemm (ggerganov#6928)

4b1c3c9

gguf : fix mismatch between alloc and free functions (ggerganov#6929)

e2764cd

Reset schedule earlier to allow overlap with ggml graph computation o…

928e0b7

…n device (ggerganov#6933) * Reset schedule earlier to allow overlap with graph computation on device

ci: server: tests python env on github container ubuntu latest / fix …

b736833

…n_predict (ggerganov#6935) * ci: server: fix python env * ci: server: fix server tests after ggerganov#6638 * ci: server: fix windows is not building PR branch

Replace "alternative" boolean operator in conditional compilation dir…

4dba7e8

…ective (ggerganov#6949)

add device version in device list (ggerganov#6959)

ce023f6

Co-authored-by: arthw <>

Fix more int overflow during quant (PPL/CUDA). (ggerganov#6563)

e00b4a8

* Fix more int overflow during quant. * Fix some more int overflow in softmax. * Revert back to int64_t.

l3utterfly merged commit c51dc33 into layla-build Apr 29, 2024
46 of 70 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge from upstream #14

merge from upstream #14

l3utterfly commented Apr 29, 2024

merge from upstream #14

merge from upstream #14

Conversation

l3utterfly commented Apr 29, 2024