[pull] master from ggerganov:master #161

pull · 2024-12-17T14:34:05Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

* sampling : refactor + optimize penalties sampler ggml-ci * common : apply ignore_eos as logit bias ggml-ci * batched : remove penalties sampler * params : allow penalty_last_n == -1 to be equal to context size ggml-ci * common : by default, move the penalties at the end of the sampling chain ggml-ci * common : ignore all EOG tokens Co-authored-by: Diego Devesa <[email protected]> * common : move back the penalties at the front of the sampling chain ggml-ci * readme : restore hint about --ignore-eos flag [no ci] * llama : minor ggml-ci * webui : update --------- Co-authored-by: Diego Devesa <[email protected]>

* unicode : improve naming style ggml-ci * cont [no ci]

* rwkv_wkv6 vulkan shader * RWKV_WKV6 Vulkan op tests passed Signed-off-by: Molly Sophia <[email protected]> * Apply code format changes Signed-off-by: Molly Sophia <[email protected]> * add [[unroll]] and remove unnecessary conditions * add uma support * fix erros in EditorConfig Checker --------- Signed-off-by: Molly Sophia <[email protected]> Co-authored-by: Molly Sophia <[email protected]>

) * ensure mul mat shaders work on systems with subgroup size less than 32 more fixes add test * only s_warptile_mmq needs to be run with 32 threads or more

* server : (UI) fix missing async generator on safari * fix

* server : fill usage info in embeddings response * server : fill usage info in reranking response

* ggml : fix cpy op for IQ-quants to use reference impl ggml-ci * ggml : disable tests involving i-matrix quantization * ggml : update ggml_backend_cpu_device_supports_op ggml-ci

* ggml : add check for grad_accs This commit adds a check for grad_accs in ggml_graph_get_grad and ggml_graph_get_grad_acc functions. This is necessary to avoid segfaults when grad_accs is not initialized. The motivation for this change is that I find it nice to be able to print out a computation graph using ggml_graph_print but this function segfaults when grad_accs is not initialized: ```console (gdb) p g1 $2 = (ggml_cgraph *) 0x7ffff66004b0 (gdb) p *g1 $3 = {size = 2048, n_nodes = 1, n_leafs = 2, nodes = 0x7ffff6600500, grads = 0x0, grad_accs = 0x0, leafs = 0x7ffff6604500, visited_hash_set = {size = 4099, used = 0x7ffff6610518, keys = 0x7ffff6608500}, order = GGML_CGRAPH_EVAL_ORDER_LEFT_TO_RIGHT} (gdb) p ggml_graph_print(g1) === GRAPH === n_nodes = 1 Program received signal SIGSEGV, Segmentation fault. 0x0000555555579775 in ggml_graph_get_grad (cgraph=0x7ffff66004b0,node=0x7ffff6600340) at /ggml/ggml/src/ggml.c:5990 5990 return igrad != GGML_HASHSET_FULL && ggml_bitset_get(cgraph->visited_hash_set.used, igrad) ? cgraph->grads[igrad] : NULL; ``` * squash! ggml : add check for grad_accs Fix the check in ggml_graph_get_grad. The check was incorrectly using cgraph->grad_accs instead of cgraph->grads.

This commit removes the return statement from ggml_gallocr_allocate_node function. The motivation behind this change is to make the code more readable and consistent.

This change prevents a division by zero error when p.KY is 0.

ggerganov and others added 6 commits December 16, 2024 12:31

unicode : improve naming style (#10838)

08ea539

* unicode : improve naming style ggml-ci * cont [no ci]

vulkan: bugfixes for small subgroup size systems + llvmpipe test (#10809

7b1ec53

) * ensure mul mat shaders work on systems with subgroup size less than 32 more fixes add test * only s_warptile_mmq needs to be run with 32 threads or more

server : (UI) fix missing async generator on safari (#10857)

227d7c5

* server : (UI) fix missing async generator on safari * fix

readme : update typos (#10863)

4f51968

pull bot added the ⤵️ pull label Dec 17, 2024

github-actions bot added examples devops server ggml Vulkan testing labels Dec 17, 2024

llama : add Falcon3 support (#10864)

382bc7f

github-actions bot added the python label Dec 17, 2024

krystiancha and others added 7 commits December 17, 2024 18:00

server : fill usage info in embeddings and rerank responses (#10852)

05c3a44

* server : fill usage info in embeddings response * server : fill usage info in reranking response

ggml : update ggml_backend_cpu_device_supports_op (#10867)

0006f5a

* ggml : fix cpy op for IQ-quants to use reference impl ggml-ci * ggml : disable tests involving i-matrix quantization * ggml : update ggml_backend_cpu_device_supports_op ggml-ci

ggml : remove return from ggml_gallocr_allocate_node (ggml/1048)

130d0c9

This commit removes the return statement from ggml_gallocr_allocate_node function. The motivation behind this change is to make the code more readable and consistent.

vulkan : fix soft_max.comp division by zero (whisper/2633)

8dd19a4

This change prevents a division by zero error when p.KY is 0.

cmake : fix "amd64" processor string (whisper/2638)

78f7667

sync : ggml

5437d4a

github-actions bot added the script label Dec 17, 2024

tests: add tests for GGUF (#10830)

081b29b

teleprint-me closed this Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggerganov:master #161

[pull] master from ggerganov:master #161

pull bot commented Dec 17, 2024 •

edited

Loading

[pull] master from ggerganov:master #161

[pull] master from ggerganov:master #161

Conversation

pull bot commented Dec 17, 2024 • edited Loading

pull bot commented Dec 17, 2024 •

edited

Loading