merge upstream #34

l3utterfly · 2024-08-28T01:44:35Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

ggml-ci

* py : fix requirements check '==' -> '~=' * cont : fix the fix * ci : run on all requirements.txt

Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=70724 In order to access the above bug you need to login using one of the emails in https://github.com/google/oss-fuzz/blob/master/projects/llamacpp/project.yaml#L3-L5 Signed-off-by: David Korczynski <[email protected]>

Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=70680 Signed-off-by: David Korczynski <[email protected]>

* readme: introduce gpustack GPUStack is an open-source GPU cluster manager for running large language models, which uses llama.cpp as the backend. Signed-off-by: thxCode <[email protected]> * readme: introduce gguf-parser GGUF Parser is a tool to review/check the GGUF file and estimate the memory usage without downloading the whole model. Signed-off-by: thxCode <[email protected]> --------- Signed-off-by: thxCode <[email protected]>

…8970) * llama : model-based max number of graph nodes calculation * Update src/llama.cpp --------- Co-authored-by: slaren <[email protected]>

ref: ggerganov#8912

Signed-off-by: Diogo Teles Sant'Anna <[email protected]>

* ggml : move rope type enum to ggml.h This commit moves the `llama_rope_type` enum from `llama.h` to `ggml.h` and changes its name to `ggml_rope_type`. The motivation for this change is to address the TODO in `llama.h` and use the enum in ggml. Note: This commit does not change the `mode` parameter to be of type `enum ggml_rope_type`. The name `mode` and its usage suggest that it might be more generic and possibly used as a bit field for multiple flags. Further investigation/discussion may be needed to determine if `mode` should be restricted to RoPE types. * squash! ggml : move rope type enum to ggml.h This commit removes GGML_ROPE_TYPE_NONE and GGML_ROPE_TYPE_GLM from ggml.h, and back the llama_rope_type enum. I've kept the assert for GGML_ROPE_TYPE_GLM as I'm not sure if it is safe to remove it yet. * squash! ggml : move rope type enum to ggml.h This commit removes the enum ggml_rope_type from ggml.h and replaces it with a define (GGML_ROPE_TYPE_NEOX). This define is used in the code to check if the mode is set to GPT-NeoX. Also the enum llama_rope_type has been updated to reflect this change. * squash! ggml : move rope type enum to ggml.h This commit contains a suggestion enable the GGML_ROPE_TYPE_NEOX macro/define to be passed to the shader compiler. * squash! ggml : move rope type enum to ggml.h This commit fixes the editorconfig-checker warnings. * squash! ggml : move rope type enum to ggml.h Update comment for ggml_rope function. * Revert "squash! ggml : move rope type enum to ggml.h" This reverts commit 6261222. * squash! ggml : move rope type enum to ggml.h Add GGML_ROPE_TYPE_NEOX to rope_common.comp. * remove extra line --------- Co-authored-by: slaren <[email protected]>

* server : fix segfault on long system prompt * server : fix parallel generation with very small batch sizes * server : fix typo in comment

* Optimize Vulkan REPEAT performance * Use Vulkan GLSL fused multiply-add instruction where possible * Add GGML_VULKAN_PERF option to output performance data per operator * Rework and fix Vulkan descriptor set and descriptor pool handling * Fix float32 concat f16 shader validation error * Add Vulkan GROUP_NORM eps parameter * Fix validation error with transfer queue memory barrier flags * Remove trailing whitespaces

) Signed-off-by: Jiri Podivin <[email protected]>

…ov#8850)

…nov#8778)

…rganov#8994)

* retrieval * Reuse querybatch to reduce frequent memory allocation * delete unused white space

) * ggml : Dynamic ggml_sched_max_splits based on graph_size * Fixed and readded debug code for causes

@compilade

…8922) * Add nemotron GGUF conversion & inference support * Fix formatting issues * Remove unnecessary write_tensors() * Update convert_hf_to_gguf.py Co-authored-by: compilade <[email protected]> * Update src/llama.cpp Co-authored-by: compilade <[email protected]> * Address comments by @compilade * Replace ggml_mul_mat()->llm_build_lora_mm() * Remove mutable variable * Use for bias tensors * Cover corner case for role_scaling not in config.json --------- Co-authored-by: compilade <[email protected]>

…rganov#8771) * Add support for cpu_get_num_phsical_cores() on Windows * fix build bug on msys2-clang64 and ucrt64 * avoid adding new function * add new macros to avoid windows+mingw64 * Add error checking to return default value

* add exaone model support * add chat template * fix whitespace Co-authored-by: Georgi Gerganov <[email protected]> * add ftype * add exaone pre-tokenizer in `llama-vocab.cpp` Co-Authored-By: compilade <[email protected]> * fix lint Co-Authored-By: compilade <[email protected]> * add `EXAONE` to supported models in `README.md` * fix space Co-authored-by: compilade <[email protected]> --------- Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: compilade <[email protected]> Co-authored-by: compilade <[email protected]>

Signed-off-by: Aisuko <[email protected]>

…ganov#8928) Co-authored-by: farbod <[email protected]>

* init * rename * add run android for termux in readme * add android readme * add instructions in readme * change name in readme * Update README.md * fixed line * add result in readme * random pos_embed * add positions index * change for ollama * change for ollama * better pos_embed in clip * support ollama * updata cmakelist * updata cmakelist * rename wrapper * clear code * replace and organize code * add link * sync master * fix warnings * fix warnings * fix bug in bicubic resize when need resize iamge smaller * receive review comments and modify * receive review comments and modify * put all code into llava dir * fix quality problem in pr code * change n_layer * add space in "-1" * imitate reshape bug of python code * fix bug in clip * fix issues for merging * fix llama-minicpmv-cli in cmake file * change pr readme * fix code review * remove in line 33 directory in the /cmakelists.txt (not in example, in the main dir * fix cmakefile * add warn * fix KEY_HAS_MINICPMV_PROJ * remove load_image_size into clip_ctx * remove the extern "C", MINICPMV_API * fix uhd code for review comment * delete minicpmv-wrapper in pr * remove uhd_image_embed * Modify 2 notes * support minicpmv2.6 * modify convert script of minicpmv * modify convert * modify convert * add readme * add resampler of v2.6 * modify clip * modify readme * fix type-check * fix type-check * fix type-check * fix type-check * modify convert script and readme * fix convert script and readme * fix convert * fix num in convert * fix type-check --------- Co-authored-by: Hongji Zhu <[email protected]> Co-authored-by: harvestingmoon <[email protected]>

* server : refactor middleware and /health endpoint * move "fail_on_no_slot" to /slots * Update examples/server/server.cpp Co-authored-by: Georgi Gerganov <[email protected]> * fix server tests * fix CI * update server docs --------- Co-authored-by: Georgi Gerganov <[email protected]>

ggml-ci

* ggml : add ggml_ssm_conv metal impl * ggml : add ssm_scan metal impl ggml-ci

* metal : separate scale and mask from QKT in FA kernel * metal : ne01 check no longer necessary * metal : keep data in local memory

…#9192)

* Update stb_image.h to latest version Fixes ggerganov#7431 * Update .ecrc

…anov#9141) * fix: llama3.1 rope_freqs not respecting custom head_dim * fix: use potential head_dim for Exaone

This should fix THUDM/glm-4-9b-chat-1m and CausalLM/miniG

* server : add some missing env variables * add LLAMA_ARG_HOST to server dockerfile * also add LLAMA_ARG_CONT_BATCHING

ggml-ci

ggerganov and others added 30 commits August 12, 2024 10:21

server : handle models with missing EOS token (ggerganov#8997)

5ef07e2

ggml-ci

py : fix requirements check '==' -> '~=' (ggerganov#8982)

d3ae0ee

* py : fix requirements check '==' -> '~=' * cont : fix the fix * ci : run on all requirements.txt

Fix a spelling mistake (ggerganov#9001)

2589292

grammar-parser : fix possible null-deref (ggerganov#9004)

1262e7e

Fixes: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=70680 Signed-off-by: David Korczynski <[email protected]>

llama : model-based max number of graph nodes calculation (ggerganov#…

0fd93cd

…8970) * llama : model-based max number of graph nodes calculation * Update src/llama.cpp --------- Co-authored-by: slaren <[email protected]>

ci : enable RPC in all of the released builds (ggerganov#9006)

1f67436

ref: ggerganov#8912

ci : fix github workflow vulnerable to script injection (ggerganov#9008)

fc4ca27

Signed-off-by: Diogo Teles Sant'Anna <[email protected]>

export-lora : throw error if lora is quantized (ggerganov#9002)

828d6ff

cmake : remove unused option GGML_CURL (ggerganov#9011)

43bdd3c

server : fix segfault on long system prompt (ggerganov#8987)

98a532d

* server : fix segfault on long system prompt * server : fix parallel generation with very small batch sizes * server : fix typo in comment

server : init stop and error fields of the result struct (ggerganov#9026

234b306

) Signed-off-by: Jiri Podivin <[email protected]>

ci : disable bench workflow (ggerganov#9010)

d5492f0

llama : add pre-tokenizer regexes for BLOOM and gpt3-finnish (ggergan…

6bda7ce

…ov#8850)

common : remove duplicate function llama_should_add_bos_token (ggerga…

4af8420

…nov#8778)

server : fix duplicated n_predict key in the generation_settings (gge…

37501d9

…rganov#8994)

retrieval : fix memory leak in retrieval query handling (ggerganov#8955)

4b9afbb

* retrieval * Reuse querybatch to reduce frequent memory allocation * delete unused white space

ggml : dynamic ggml_sched_max_splits based on graph_size (ggerganov#9047

e3f6fd5

) * ggml : Dynamic ggml_sched_max_splits based on graph_size * Fixed and readded debug code for causes

gguf-py : bump version from 0.9.1 to 0.10.0 (ggerganov#9051)

23fd453

Fix inference example lacks required parameters (ggerganov#9035)

c8ddce8

Signed-off-by: Aisuko <[email protected]>

py : fix wrong input type for raw_dtype in ggml to gguf scripts (gger…

ee2984b

…ganov#8928) Co-authored-by: farbod <[email protected]>

Fix incorrect use of ctx_split for bias tensors (ggerganov#9063)

2fb9267

slaren and others added 17 commits August 26, 2024 11:03

ggml-ci : try to improve build time (ggerganov#9160)

f12ceac

metal : gemma2 flash attention support (ggerganov#9159)

0c41e03

server : update deps (ggerganov#9183)

e5edb21

ci : add VULKAN support to ggml-ci (ggerganov#9055)

7a3df79

tests : fix compile warnings for unreachable code (ggerganov#9185)

879275a

ggml-ci

ggml : add SSM Metal kernels (ggerganov#8546)

fc18425

* ggml : add ggml_ssm_conv metal impl * ggml : add ssm_scan metal impl ggml-ci

metal : separate scale and mask from QKT in FA kernel (ggerganov#9189)

06658ad

* metal : separate scale and mask from QKT in FA kernel * metal : ne01 check no longer necessary * metal : keep data in local memory

ggml : do not crash when quantizing q4_x_x with an imatrix (ggerganov…

7d787ed

…#9192)

common : Update stb_image.h to latest version (ggerganov#9161)

ad76569

* Update stb_image.h to latest version Fixes ggerganov#7431 * Update .ecrc

llama : fix llama3.1 rope_freqs not respecting custom head_dim (ggerg…

75e1dbb

…anov#9141) * fix: llama3.1 rope_freqs not respecting custom head_dim * fix: use potential head_dim for Exaone

llama : fix ChatGLM4 wrong shape (ggerganov#9194)

2e59d61

This should fix THUDM/glm-4-9b-chat-1m and CausalLM/miniG

server : add some missing env variables (ggerganov#9116)

a77feb5

* server : add some missing env variables * add LLAMA_ARG_HOST to server dockerfile * also add LLAMA_ARG_CONT_BATCHING

llama : fix qs.n_attention_wv for DeepSeek-V2 (ggerganov#9156)

78eb487

Fix minicpm example directory (ggerganov#9111)

3246fe8

sync : ggml

231cff5

vulkan : fix build (#0)

20f1789

ggml-ci

Merge branch 'ggerganov:master' into master

23e298c

l3utterfly merged commit 3437c58 into layla-build Aug 28, 2024
54 of 67 checks passed

github-actions bot added documentation Improvements or additions to documentation SYCL Nvidia GPU Vulkan testing build examples devops python server ggml script labels Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge upstream #34

merge upstream #34

l3utterfly commented Aug 28, 2024

merge upstream #34

merge upstream #34

Conversation

l3utterfly commented Aug 28, 2024