Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dry sampler #11

Merged
merged 33 commits into from
Apr 25, 2024
Merged

Dry sampler #11

merged 33 commits into from
Apr 25, 2024

Conversation

l3utterfly
Copy link
Owner

No description provided.

CISC and others added 30 commits April 18, 2024 14:49
* Support converting models with multiple chat templates

Adds the following metadata:
* tokenizer.chat_templates
* tokenizer.chat_template.<name1>
* tokenizer.chat_template.<name2>
* tokenizer.chat_template.<...>

Where `tokenizer.chat_templates` is an array of the template names (except `default`), `default` is added to the regular `tokenizer.chat_template`.

* replace filtered characters with underscore

* New script to add/modify/remove metadata

This scripts creates a copy of a GGUF file and allows you to add/modify/remove metadata in the process.

Most importantly this allows you to update chat templates, either as a string or directly from an updated tokenizer_config.json file.

* Add files via upload

add new script to project/readme

* flake--
* ggml : group all experts in a single ggml_mul_mat_id
cuda : improve mmid row copy

* cuda : fix bin bcast with non-cont src0

* test-backend-ops : only run all mul mat tests for base types

* llama : disable moe offloading with SYCL

---------

Co-authored-by: Georgi Gerganov <[email protected]>
* llama : make general.name optional

* train: Add 'general.name' to model metadata

Signed-off-by: teleprint-me <[email protected]>

---------

Signed-off-by: teleprint-me <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
* implement olmo architecture

* remove unused variable

* remove unused moe branch

* remove check for weight

* remove superfluous moe, bias and rope tensors

* clarified comment

* fix clamp_kqv setting

* remove obsolete parameter name filter
* common : disable get_math_cpu_count() until Android CI gets fixed

* common : another try
* Support Llama 3 conversion

The tokenizer is BPE.

* style

* Accept suggestion

Co-authored-by: Sourab Mangrulkar <[email protected]>

* llama : add llama_token_is_eog()

ggml-ci

* llama : auto-detect more EOT tokens when missing in KV data

* convert : replacing EOS token is a hack

* llama : fix codegemma EOT token + add TODOs

* llama : fix model type string for 8B model

---------

Co-authored-by: Sourab Mangrulkar <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
This change removes printf() logging so llava-cli is shell scriptable.
* added fedora to list of distros that may need the package (the packages have the same name on Fedora)

* how to add clblast that is avalible in the fedora repos
* Added llama-3 chat template

* Update llama.cpp

Co-authored-by: Samuel Tallet <[email protected]>

* Update llama.cpp

Co-authored-by: Samuel Tallet <[email protected]>

* Update tests/test-chat-template.cpp

Co-authored-by: Samuel Tallet <[email protected]>

* Added EOS stop sequence according to ggerganov#6751 (comment)

* Removed adding of BOS token before first message

* Removed bos token from expected output from llama-3

* Update tests/test-chat-template.cpp

Co-authored-by: Rene Leonhardt <[email protected]>

* Update tests/test-chat-template.cpp

Co-authored-by: Rene Leonhardt <[email protected]>

* Added <|end_of_text|> as another stop token

* Reverted last change of adding the end_of_text stop word for llama 3

---------

Co-authored-by: Wouter Tichelaar <[email protected]>
Co-authored-by: Samuel Tallet <[email protected]>
Co-authored-by: Rene Leonhardt <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
* make : fix common dep on llama.h

* llama : add option to render special tokens

* readme : add API change notice

ggml-ci

* swift : fix build
)

* `build`: generate hex dumps of server assets on the fly

* build: workaround lack of -n on gnu xxd

* build: don't use xxd in cmake

* build: don't call xxd from build.zig

* build: more idiomatic hexing

* build: don't use xxd in Makefile (od hackery instead)

* build: avoid exceeding max cmd line limit in makefile hex dump

* build: hex dump assets at cmake build time (not config time)
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/1042fd8b148a9105f3c0aca3a6177fd1d9360ba5?narHash=sha256-3sbWO1mbpWsLepZGbWaMovSO7ndZeFqDSdX0hZ9nVyw%3D' (2024-04-10)
  → 'github:NixOS/nixpkgs/5c24cf2f0a12ad855f444c30b2421d044120c66f?narHash=sha256-XtTSSIB2DA6tOv%2Bl0FhvfDMiyCmhoRbNB%2B0SeInZkbk%3D' (2024-04-19)
Latest gcc complains here:
/home/airlied/devel/llama.cpp/ggml-alloc.c: In function ‘ggml_gallocr_new_n’:
/home/airlied/devel/llama.cpp/ggml-alloc.c:374:59: warning: ‘calloc’ sizes specified with ‘sizeof’ in the earlier argument and not in the later argument [-Wcalloc-transposed-args]
  374 |     ggml_gallocr_t galloc = (ggml_gallocr_t)calloc(sizeof(struct ggml_gallocr), 1);
      |                                                           ^~~~~~
/home/airlied/devel/llama.cpp/ggml-alloc.c:374:59: note: earlier argument should specify number of elements, later size of each element

and a bunch more.

calloc is specified to take nmemb first then size, so realign the code.

In a couple of places there was a * x, 1 so I fixed those to use calloc properly.
* llamafile : improve sgemm.cpp

- Re-enable by default
- Fix issue described in ggerganov#6716
- Make code more abstract, elegant, and maintainable
- Faster handling of weirdly shaped `m` an `n` edge cases

* Address review comments

* Help clang produce fma instructions

* Address review comments
…ag activated (ggerganov#6767)

* Fix FP32/FP16 build instructions

* Fix typo

* Recommended build instruction

Co-authored-by: Neo Zhang Jianyu <[email protected]>

* Recommended build instruction

Co-authored-by: Neo Zhang Jianyu <[email protected]>

* Recommended build instruction

Co-authored-by: Neo Zhang Jianyu <[email protected]>

* Add comments in Intel GPU linux

---------

Co-authored-by: Anas Ahouzi <[email protected]>
Co-authored-by: Neo Zhang Jianyu <[email protected]>
@l3utterfly l3utterfly merged commit ef64937 into layla-build Apr 25, 2024
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.