[pull] master from ggerganov:master #10

pull · 2024-01-02T22:46:26Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

* ggml : disable fast-math for Metal (cmake build only) ggml-ci * metal : fix Metal API debug warnings * cmake : add -fno-inline for Metal build (#4545) * metal : fix API debug warnings * metal : fix compile warnings * metal : use uint64_t for strides * cmake : rename option to LLAMA_METAL_SHADER_DEBUG * metal : fix mat-vec Q8_0 kernel for BS > 1 * metal : normalize mat-vec kernel signatures * cmake : respect LLAMA_QKK_64 option * metal : fix mat-vec Q4_K kernel for QK_K == 64 ggml-ci

Signed-off-by: Daniel Bevenius <[email protected]>

* update: awq support llama-7b model * update: change order * update: benchmark results for llama2-7b * update: mistral 7b v1 benchmark * update: support 4 models * fix: Readme * update: ready for PR * update: readme * fix: readme * update: change order import * black * format code * update: work for bot mpt and awqmpt * update: readme * Rename to llm_build_ffn_mpt_awq * Formatted other files * Fixed params count * fix: remove code * update: more detail for mpt * fix: readme * fix: readme * update: change folder architecture * fix: common.cpp * fix: readme * fix: remove ggml_repeat * update: cicd * update: cicd * uppdate: remove use_awq arg * update: readme * llama : adapt plamo to new ffn ggml-ci * fix: update torch version --------- Co-authored-by: Trần Đức Nam <[email protected]> Co-authored-by: Le Hoang Anh <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>

* Changes to server to allow metadata override * documentation * flake.nix: expose full scope in legacyPackages * flake.nix: rocm not yet supported on aarch64, so hide the output * flake.nix: expose checks * workflows: nix-ci: init; build flake outputs * workflows: nix-ci: add a job for eval * workflows: weekly `nix flake update` * workflows: nix-flakestry: drop tag filters ...and add a job for flakehub.com * workflows: nix-ci: add a qemu job for jetsons * flake.nix: suggest the binary caches * flake.lock: update to a commit recently cached by nixpkgs-cuda-ci --------- Co-authored-by: John <[email protected]> Co-authored-by: Someone Serge <[email protected]>

* Add n_key_dim and n_value_dim Some models use values that are not derived from `n_embd`. Also remove `n_embd_head` and `n_embd_gqa` because it is not clear which "head" is referred to (key or value). Fix issue #4648. * Fix `llm_build_kqv` to use `n_value_gqa` * Rebase * Rename variables * Fix llm_build_kqv to be more generic wrt n_embd_head_k * Update default values for n_embd_head_k and n_embd_head_v Co-authored-by: Georgi Gerganov <[email protected]> * Fix llm_load_tensors: the asserts were not backcompat --------- Co-authored-by: Georgi Gerganov <[email protected]>

* replaced all API facing `int`'s with `int32_t` * formatting and missed `int` in `llama_token_to_piece`

* server: add token counts to stats * server: generate hpp --------- Co-authored-by: phiharri <[email protected]>

* ggml : disable fast-math for Metal (cmake build only) ggml-ci * metal : fix Metal API debug warnings * cmake : add -fno-inline for Metal build (#4545) * metal : fix API debug warnings * metal : fix compile warnings * metal : use uint64_t for strides * cmake : rename option to LLAMA_METAL_SHADER_DEBUG * metal : fix mat-vec Q8_0 kernel for BS > 1 * metal : normalize mat-vec kernel signatures * cmake : respect LLAMA_QKK_64 option * metal : fix mat-vec Q4_K kernel for QK_K == 64 * metal : optimizing ggml_mul_mat_id (wip) * metal : minor fix * metal : opt mul_mm_id

* add more int ops * ggml_compute_forward_dup_bytes * add tests * PR comments * tests : minor indentations --------- Co-authored-by: Georgi Gerganov <[email protected]>

ggml-ci

Co-authored-by: slaren <[email protected]>

* updates the package.swift to use ggml as dependency * changes the ggml package url src to ggerganov

This commit fixes a typo in the help message for the --overlapping-samples option. Signed-off-by: Daniel Bevenius <[email protected]>

* metal: fix metal backend init failure in swiftui * metal: build ggml.metallib instead of copy src * llama.swift : remove debug flags from metallib build --------- Co-authored-by: Georgi Gerganov <[email protected]>

* fix examples/server/README.md * minor : fix whitespace --------- Co-authored-by: Georgi Gerganov <[email protected]>

* swiftui: support load model from file picker * swiftui: remove trailing whitespace

This commit removes unused includes from finetune.cpp. Signed-off-by: Daniel Bevenius <[email protected]>

* ggml : do not sched_yield when calling BLAS ggml-ci * ggml : fix do_yield logic ggml-ci * ggml : simplify do_yield logic ggml-ci

ggml-ci

betwen -> between

openblas v0.3.22 64-bit pkg-config file is named openblas64.pc OpenMathLib/OpenBLAS#3790

ggerganov and others added 10 commits January 2, 2024 10:57

finetune: fix typo in README.md (#4733)

775ac87

Signed-off-by: Daniel Bevenius <[email protected]>

editorconfig : fix whitespace and indentation #4710

32866c5

llama : replace all API facing int's with int32_t (#4577)

0040d42

* replaced all API facing `int`'s with `int32_t` * formatting and missed `int` in `llama_token_to_piece`

llama : llama_model_desc print number of experts

540938f

server : add token counts to html footer (#4738)

0ef3ca2

* server: add token counts to stats * server: generate hpp --------- Co-authored-by: phiharri <[email protected]>

pull bot added the ⤵️ pull label Jan 2, 2024

jparkerweb and others added 19 commits January 3, 2024 10:43

server : throw an error when slot unavailable (#4741)

f2eb19b

ggml : extend ggml_get_rows, ggml_repeat, ggml_concat (ggml/639)

5f66ebc

* add more int ops * ggml_compute_forward_dup_bytes * add tests * PR comments * tests : minor indentations --------- Co-authored-by: Georgi Gerganov <[email protected]>

scripts : fix sync order + metal sed

ab62fc3

metal : add kernel_get_rows_i32

2893137

ggml-ci

sync : ggml

75e3fd8

ggml-ci

cuda : mark I16 and I32 ops as unsupported

d55356d

ggml-ci

cuda : simplify expression

7bed7eb

Co-authored-by: slaren <[email protected]>

swift : update Package.swift to use ggml as dependency (#4691)

ece9a45

* updates the package.swift to use ggml as dependency * changes the ggml package url src to ggerganov

train : fix typo in overlapping-samples help msg (#4758)

cb1e281

This commit fixes a typo in the help message for the --overlapping-samples option. Signed-off-by: Daniel Bevenius <[email protected]>

llama.swiftui : fix build of ggml.metallib (#4754)

46cea79

* metal: fix metal backend init failure in swiftui * metal: build ggml.metallib instead of copy src * llama.swift : remove debug flags from metallib build --------- Co-authored-by: Georgi Gerganov <[email protected]>

ggml : include stdlib.h before intrin.h (#4736)

dc891b7

server : fix options in README.md (#4765)

e580431

* fix examples/server/README.md * minor : fix whitespace --------- Co-authored-by: Georgi Gerganov <[email protected]>

llama.swiftui : support loading custom model from file picker (#4767)

3c0b585

* swiftui: support load model from file picker * swiftui: remove trailing whitespace

Print backend name on test-backend-ops failure (#4751)

a919280

server : send token probs for "stream == false" (#4714)

012cf34

finetune : remove unused includes (#4756)

b3a7c20

This commit removes unused includes from finetune.cpp. Signed-off-by: Daniel Bevenius <[email protected]>

examples : add few-shot translation example (#4783)

3681f22

ggml : do not sched_yield when calling BLAS (#4761)

c1d7cb2

* ggml : do not sched_yield when calling BLAS ggml-ci * ggml : fix do_yield logic ggml-ci * ggml : simplify do_yield logic ggml-ci

ggml : add error handling to graph_compute (whisper/1714)

1bf681f

ggerganov and others added 4 commits January 5, 2024 18:02

ggml : fix q2_k bpw in comments (ggml/680)

d061bf9

metal : switch back to default.metallib (ggml/681)

91d3887

ggml-ci

flake.nix : fix typo (#4700)

be36bb9

betwen -> between

cmake : check for openblas64 (#4134)

eec22a1

openblas v0.3.22 64-bit pkg-config file is named openblas64.pc OpenMathLib/OpenBLAS#3790

teleprint-me closed this Jan 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggerganov:master #10

[pull] master from ggerganov:master #10

pull bot commented Jan 2, 2024 •

edited

Loading

[pull] master from ggerganov:master #10

[pull] master from ggerganov:master #10

Conversation

pull bot commented Jan 2, 2024 • edited Loading

pull bot commented Jan 2, 2024 •

edited

Loading