forked from ggerganov/llama.cpp
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dry sampler #11
Merged
Merged
Dry sampler #11
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Support converting models with multiple chat templates Adds the following metadata: * tokenizer.chat_templates * tokenizer.chat_template.<name1> * tokenizer.chat_template.<name2> * tokenizer.chat_template.<...> Where `tokenizer.chat_templates` is an array of the template names (except `default`), `default` is added to the regular `tokenizer.chat_template`. * replace filtered characters with underscore * New script to add/modify/remove metadata This scripts creates a copy of a GGUF file and allows you to add/modify/remove metadata in the process. Most importantly this allows you to update chat templates, either as a string or directly from an updated tokenizer_config.json file. * Add files via upload add new script to project/readme * flake--
* ggml : group all experts in a single ggml_mul_mat_id cuda : improve mmid row copy * cuda : fix bin bcast with non-cont src0 * test-backend-ops : only run all mul mat tests for base types * llama : disable moe offloading with SYCL --------- Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: jianyuzh <[email protected]>
* llama : make general.name optional * train: Add 'general.name' to model metadata Signed-off-by: teleprint-me <[email protected]> --------- Signed-off-by: teleprint-me <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>
* implement olmo architecture * remove unused variable * remove unused moe branch * remove check for weight * remove superfluous moe, bias and rope tensors * clarified comment * fix clamp_kqv setting * remove obsolete parameter name filter
* common : disable get_math_cpu_count() until Android CI gets fixed * common : another try
* Support Llama 3 conversion The tokenizer is BPE. * style * Accept suggestion Co-authored-by: Sourab Mangrulkar <[email protected]> * llama : add llama_token_is_eog() ggml-ci * llama : auto-detect more EOT tokens when missing in KV data * convert : replacing EOS token is a hack * llama : fix codegemma EOT token + add TODOs * llama : fix model type string for 8B model --------- Co-authored-by: Sourab Mangrulkar <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>
This change removes printf() logging so llava-cli is shell scriptable.
* added fedora to list of distros that may need the package (the packages have the same name on Fedora) * how to add clblast that is avalible in the fedora repos
* Added llama-3 chat template * Update llama.cpp Co-authored-by: Samuel Tallet <[email protected]> * Update llama.cpp Co-authored-by: Samuel Tallet <[email protected]> * Update tests/test-chat-template.cpp Co-authored-by: Samuel Tallet <[email protected]> * Added EOS stop sequence according to ggerganov#6751 (comment) * Removed adding of BOS token before first message * Removed bos token from expected output from llama-3 * Update tests/test-chat-template.cpp Co-authored-by: Rene Leonhardt <[email protected]> * Update tests/test-chat-template.cpp Co-authored-by: Rene Leonhardt <[email protected]> * Added <|end_of_text|> as another stop token * Reverted last change of adding the end_of_text stop word for llama 3 --------- Co-authored-by: Wouter Tichelaar <[email protected]> Co-authored-by: Samuel Tallet <[email protected]> Co-authored-by: Rene Leonhardt <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>
* make : fix common dep on llama.h * llama : add option to render special tokens * readme : add API change notice ggml-ci * swift : fix build
) * `build`: generate hex dumps of server assets on the fly * build: workaround lack of -n on gnu xxd * build: don't use xxd in cmake * build: don't call xxd from build.zig * build: more idiomatic hexing * build: don't use xxd in Makefile (od hackery instead) * build: avoid exceeding max cmd line limit in makefile hex dump * build: hex dump assets at cmake build time (not config time)
Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/1042fd8b148a9105f3c0aca3a6177fd1d9360ba5?narHash=sha256-3sbWO1mbpWsLepZGbWaMovSO7ndZeFqDSdX0hZ9nVyw%3D' (2024-04-10) → 'github:NixOS/nixpkgs/5c24cf2f0a12ad855f444c30b2421d044120c66f?narHash=sha256-XtTSSIB2DA6tOv%2Bl0FhvfDMiyCmhoRbNB%2B0SeInZkbk%3D' (2024-04-19)
Latest gcc complains here: /home/airlied/devel/llama.cpp/ggml-alloc.c: In function ‘ggml_gallocr_new_n’: /home/airlied/devel/llama.cpp/ggml-alloc.c:374:59: warning: ‘calloc’ sizes specified with ‘sizeof’ in the earlier argument and not in the later argument [-Wcalloc-transposed-args] 374 | ggml_gallocr_t galloc = (ggml_gallocr_t)calloc(sizeof(struct ggml_gallocr), 1); | ^~~~~~ /home/airlied/devel/llama.cpp/ggml-alloc.c:374:59: note: earlier argument should specify number of elements, later size of each element and a bunch more. calloc is specified to take nmemb first then size, so realign the code. In a couple of places there was a * x, 1 so I fixed those to use calloc properly.
* llamafile : improve sgemm.cpp - Re-enable by default - Fix issue described in ggerganov#6716 - Make code more abstract, elegant, and maintainable - Faster handling of weirdly shaped `m` an `n` edge cases * Address review comments * Help clang produce fma instructions * Address review comments
…ag activated (ggerganov#6767) * Fix FP32/FP16 build instructions * Fix typo * Recommended build instruction Co-authored-by: Neo Zhang Jianyu <[email protected]> * Recommended build instruction Co-authored-by: Neo Zhang Jianyu <[email protected]> * Recommended build instruction Co-authored-by: Neo Zhang Jianyu <[email protected]> * Add comments in Intel GPU linux --------- Co-authored-by: Anas Ahouzi <[email protected]> Co-authored-by: Neo Zhang Jianyu <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.