merged from upstream #26

l3utterfly · 2024-07-08T07:23:24Z

No description provided.

…rompt (ggerganov#7950) * SimpleChat: Allow for chat req bool options to be user controlled * SimpleChat: Allow user to control cache_prompt flag in request * SimpleChat: Add sample GUI images to readme file Show the chat screen and the settings screen * SimpleChat:Readme: Add quickstart block, title to image, cleanup * SimpleChat: RePosition contents of the Info and Settings UI Make it more logically structured and flow through. * SimpleChat: Rename to apiRequestOptions from chatRequestOptions So that it is not wrongly assumed that these request options are used only for chat/completions endpoint. Rather these are used for both the end points, so rename to match semantic better. * SimpleChat: Update image included with readme wrt settings ui * SimpleChat:ReadMe: Switch to webp screen image to reduce size

* add chat template support for llama-cli * add help message * server: simplify format_chat * more consistent naming * improve * add llama_chat_format_example * fix server * code style * code style * Update examples/main/main.cpp Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Georgi Gerganov <[email protected]>

…8069) * remove completions file * fix inverted vector * add mean method * code style * remove inverted pca hotfix

…ggerganov#8054) * gguf-dump: add --data-offset * gguf-dump: add tensor data offset table * gguf-dump: refactor GGUFReader for clarity * gguf-dump: add --data-alignment * gguf-dump.py: Rename variables and adjust comments start_data_offset --> data_offset _build_tensors_info_fields --> _build_tensor_info

* added healthcheck * added healthcheck * added healthcheck * added healthcheck * added healthcheck * moved curl to base * moved curl to base

…Maximum (ggerganov#7797) * json: support minimum for positive integer values * json: fix min 0 * json: min + max integer constraints * json: handle negative min / max integer bounds * json: fix missing paren min/max bug * json: proper paren fix * json: integration test for schemas * json: fix bounds tests * Update json-schema-to-grammar.cpp * json: fix negative max * json: fix negative min (w/ more than 1 digit) * Update test-grammar-integration.cpp * json: nit: move string rules together * json: port min/max integer support to Python & JS * nit: move + rename _build_min_max_int * fix min in [1, 9] * Update test-grammar-integration.cpp * add C++11-compatible replacement for std::string_view * add min/max constrained int field to pydantic json schema example * fix merge * json: add integration tests for min/max bounds * reshuffle/merge min/max integ test cases * nits / cleanups * defensive code against string out of bounds (apparently different behaviour of libstdc++ vs. clang's libc++, can't read final NULL char w/ former)

* llama : return nullptr from llama_grammar_init This commit updates llama_grammar_init to return nullptr instead of throwing an exception. The motivation for this is that this function is declared inside an extern "C" block and is intended/may be used from C code which will not be able to handle exceptions thrown, and results in undefined behavior. On Windows and using MSVC the following warning is currently generated: ```console C:\llama.cpp\llama.cpp(13998,1): warning C4297: 'llama_grammar_init': function assumed not to throw an exception but does C:\llama.cpp\llama.cpp(13998,1): message : __declspec(nothrow), throw(), noexcept(true), or noexcept was specified on the function ``` Signed-off-by: Daniel Bevenius <[email protected]> * squash! llama : return nullptr from llama_grammar_init Add checks for nullptr when calling llama_grammar_init. Signed-off-by: Daniel Bevenius <[email protected]> --------- Signed-off-by: Daniel Bevenius <[email protected]> Co-authored-by: Clint Herron <[email protected]>

…milies (ggerganov#5763) * llama : add T5 model architecture, tensors and model header parameters * llama : add implementation of Unigram tokenizer with SentencePiece-like text normalization using precompiled charsmap --------- Co-authored-by: Stanisław Szymczyk <[email protected]>

…ions in `llama.cpp` [needs testing] (ggerganov#8060) * fixes ggerganov#7999 The `build_command_r` forgot to add the control vector. * Fixes qwen2 too * Fixed all models' control vectors * Removed double calls to `cb(cur, "l_out", il)` * Moved control vector logic to llama_control_vector:apply_to()

…anov#7840) * json: default additionalProperty to true * json: don't force additional props after normal properties! * json: allow space after enum/const * json: update pydantic example to set additionalProperties: false * json: prevent additional props to redefine a typed prop * port not_strings to python, add trailing space * fix not_strings & port to js+py * Update json-schema-to-grammar.cpp * fix _not_strings for substring overlaps * json: fix additionalProperties default, uncomment tests * json: add integ. test case for additionalProperties * json: nit: simplify condition * reformat grammar integ tests w/ R"""()""" strings where there's escapes * update # tokens in server test: consts can now have trailing space

…ed items) (ggerganov#7863) * json: better suport for "type" arrays (e.g. `{"type": ["array", "null"], "items": {"type": "string"}}`) * json: add test for type: [array, null] fix * update tests

)

…#8115) * Add message about int8 support * Add suggestions from review Co-authored-by: Johannes Gäßler <[email protected]> --------- Co-authored-by: Johannes Gäßler <[email protected]>

* scripts : update sync [no ci] * files : relocate [no ci] * ci : disable kompute build [no ci] * cmake : fixes [no ci] * server : fix mingw build ggml-ci * cmake : minor [no ci] * cmake : link math library [no ci] * cmake : build normal ggml library (not object library) [no ci] * cmake : fix kompute build ggml-ci * make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE ggml-ci * move public backend headers to the public include directory (ggerganov#8122) * move public backend headers to the public include directory * nix test * spm : fix metal header --------- Co-authored-by: Georgi Gerganov <[email protected]> * scripts : fix sync paths [no ci] * scripts : sync ggml-blas.h [no ci] --------- Co-authored-by: slaren <[email protected]>

ggml-ci

…S (cmake) (ggerganov#8140)

…8142)

* clip : suppress unused variable warnings This commit suppresses unused variable warnings for the variables e in the catch blocks. The motivation for this change is to suppress the warnings that are generated on Windows when using the MSVC compiler. The warnings are not displayed when using GCC because GCC will mark all catch parameters as used. Signed-off-by: Daniel Bevenius <[email protected]> * squash! clip : suppress unused variable warnings Remove e (/*e*/) instead instead of using GGML_UNUSED. --------- Signed-off-by: Daniel Bevenius <[email protected]>

…nov#8145) - Path seems to be wrong for the common.h header file in llama-android.cpp file. Fixing the path so the Android Build doesn't fail with the error "There is no file common/common.h"

* account for space prefix character * use find instead

Co-authored-by: kustaaya <[email protected]>

This patch replaces an old commad "main" with "llama-cli" in finetune.sh. The part that I fixed is comment, so it doesn't change the script. Signed-off-by: Masanari Iida <[email protected]>

Rename an old command name "finetune" to "llama-finetune" in README.md Signed-off-by: Masanari Iida <[email protected]>

ggml-ci

* add chatglm3-6b model support huggingface model: https://hf-mirror.com/THUDM/chatglm3-6b Signed-off-by: XingXing Qiao <[email protected]> * remove .rotary_pos_emb.inv_freq and unuse code for chatglm3 model Signed-off-by: XingXing Qiao <[email protected]> * fix lint error Signed-off-by: XingXing Qiao <[email protected]> * optimize convert-hf-to-gguf.py for chatglm model Signed-off-by: XingXing Qiao <[email protected]> * support glm-4-9b-chat Signed-off-by: XingXing Qiao <[email protected]> * fix eos tokens to glm4 * remove unused log * add preprocess to chatglm3 and chatglm4 * add eos_id_list to llama.cpp * fix code style * fix code style * fix conflicts * fix conflicts * Revert "add eos_id_list to llama.cpp" This reverts commit 3a4d579. * set <|endoftext|> as eos and <|user|> as eot * fix chat template bug * add comment to glm prefix and suffix * fix conflicts and add rope_ratio & ChatGLMForConditionalGeneration * fix chat template bug * fix codestyle * fix conflicts * modified the general name of glm model * fix conflicts * remove prefix and suffix * use normal glm4 chattempalte & use LLM_FFN_SWIGLU in phi3 * fix: resolve Flake8 errors in `convert-hf-to-gguf.py` - Fix E302 by adding two blank lines before top-level function definitions - Replace print statements to fix NP100 - Fix E303 by ensuring only one blank line between lines of code * fix rope ratio to solve incorrect answers * fix by comments --------- Signed-off-by: XingXing Qiao <[email protected]> Co-authored-by: XingXing Qiao <[email protected]> Co-authored-by: Umpire2018 <[email protected]>

…gerganov#8048) CLI to hash GGUF files to detect difference on a per model and per tensor level The hash type we support is: - `--xxh64`: use xhash 64bit hash mode (default) - `--sha1`: use sha1 - `--uuid`: use uuid - `--sha256`: use sha256 While most POSIX systems already have hash checking programs like sha256sum, it is designed to check entire files. This is not ideal for our purpose if we want to check for consistency of the tensor data even if the metadata content of the gguf KV store has been updated. This program is designed to hash a gguf tensor payload on a 'per tensor layer' in addition to a 'entire tensor model' hash. The intent is that the entire tensor layer can be checked first but if there is any detected inconsistencies, then the per tensor hash can be used to narrow down the specific tensor layer that has inconsistencies. Co-authored-by: Georgi Gerganov <[email protected]>

* adding guile_llama_cpp to binding list * fix formatting * fix formatting

* Added checks for cmake,make and ctest * Removed erroneous whitespace

* Update README.md * Update README.md * Update README.md fixed llama-cli/main, templates on some cmds added chat template sections and fixed typos in some areas * Update README.md * Update README.md * Update README.md

* py : type-check all Python scripts with Pyright * server-tests : use trailing slash in openai base_url * server-tests : add more type annotations * server-tests : strip "chat" from base_url in oai_chat_completions * server-tests : model metadata is a dict * ci : disable pip cache in type-check workflow The cache is not shared between branches, and it's 250MB in size, so it would become quite a big part of the 10GB cache limit of the repo. * py : fix new type errors from master branch * tests : fix test-tokenizer-random.py Apparently, gcc applies optimisations even when pre-processing, which confuses pycparser. * ci : only show warnings and errors in python type-check The "information" level otherwise has entries from 'examples/pydantic_models_to_grammar.py', which could be confusing for someone trying to figure out what failed, considering that these messages can safely be ignored even though they look like errors.

…nov#8351)

HatsuneMikuUwU33 and others added 30 commits June 25, 2024 10:44

Update control vector help (ggerganov#8104)

f702a90

cvector: better prompt handling, add "mean vector" method (ggerganov#…

49c03c7

…8069) * remove completions file * fix inverted vector * add mean method * code style * remove inverted pca hotfix

Add healthchecks to llama-server containers (ggerganov#8081)

925c309

* added healthcheck * added healthcheck * added healthcheck * added healthcheck * added healthcheck * moved curl to base * moved curl to base

disable docker CI on pull requests (ggerganov#8110)

dd047b4

json: better support for "type" unions (e.g. nullable arrays w/ typ…

9b2f16f

…ed items) (ggerganov#7863) * json: better suport for "type" arrays (e.g. `{"type": ["array", "null"], "items": {"type": "string"}}`) * json: add test for type: [array, null] fix * update tests

llama : extend llm_build_ffn() to support _scale tensors (ggerganov#8103

494165f

)

CUDA: fix misaligned shared memory read (ggerganov#8123)

c8771ab

Clarify default MMQ for CUDA and LLAMA_CUDA_FORCE_MMQ flag (ggerganov…

8854044

…#8115) * Add message about int8 support * Add suggestions from review Co-authored-by: Johannes Gäßler <[email protected]> --------- Co-authored-by: Johannes Gäßler <[email protected]>

readme : update API notes

a95631e

devops : remove clblast + LLAMA_CUDA -> GGML_CUDA (ggerganov#8139)

0e814df

ggml-ci

authors : regen

4713bf3

sync : ggml

f2d48ff

make : fix missing -O3 (ggerganov#8143)

c7ab7b6

ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLA…

31ec399

…S (cmake) (ggerganov#8140)

ci : publish new docker images only when the files change (ggerganov#…

ae5d0f4

…8142)

scripts : fix filename sync

c70d117

Fix llama-android.cpp for error - "common/common.h not found" (ggerga…

ac14662

…nov#8145) - Path seems to be wrong for the common.h header file in llama-android.cpp file. Fixing the path so the Android Build doesn't fail with the error "There is no file common/common.h"

llama : fix CodeLlama FIM token checks (ggerganov#8144)

911e35b

* account for space prefix character * use find instead

Added support for Viking pre-tokenizer (ggerganov#8135)

f675b20

Co-authored-by: kustaaya <[email protected]>

CUDA: fix MMQ stream-k for --split-mode row (ggerganov#8167)

85a267d

standby24x7 and others added 13 commits July 7, 2024 13:37

finetune: Rename an old command name in finetune.sh (ggerganov#8344)

210eb9e

This patch replaces an old commad "main" with "llama-cli" in finetune.sh. The part that I fixed is comment, so it doesn't change the script. Signed-off-by: Masanari Iida <[email protected]>

finetune: Rename command name in README.md (ggerganov#8343)

b81ba1f

Rename an old command name "finetune" to "llama-finetune" in README.md Signed-off-by: Masanari Iida <[email protected]>

py : use cpu-only torch in requirements.txt (ggerganov#8335)

d39130a

llama : fix n_rot default (ggerganov#8348)

b504008

ggml-ci

readme : update bindings list (ggerganov#8222)

f1948f1

* adding guile_llama_cpp to binding list * fix formatting * fix formatting

ci : add checks for cmake,make and ctest in ci/run.sh (ggerganov#8200)

4090ea5

* Added checks for cmake,make and ctest * Removed erroneous whitespace

Update llama-cli documentation (ggerganov#8315)

a8db2a9

* Update README.md * Update README.md * Update README.md fixed llama-cli/main, templates on some cmds added chat template sections and fixed typos in some areas * Update README.md * Update README.md * Update README.md

readme : add supported glm models (ggerganov#8360)

04ce3a8

common : avoid unnecessary logits fetch (ggerganov#8358)

ffd0079

infill : assert prefix/suffix tokens + remove old space logic (ggerga…

6f0dbf6

…nov#8351)

l3utterfly merged commit 73bf865 into layla-build Jul 8, 2024
56 of 69 checks passed

github-actions bot added documentation Improvements or additions to documentation SYCL Nvidia GPU Vulkan testing build examples devops python android server ggml Kompute Apple Metal script nix labels Jul 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merged from upstream #26

merged from upstream #26

l3utterfly commented Jul 8, 2024

merged from upstream #26

merged from upstream #26

Conversation

l3utterfly commented Jul 8, 2024