[pull] master from ggerganov:master #155

pull · 2024-11-25T22:12:03Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

It's like simple-chat but it uses smart pointers to avoid manual memory cleanups. Less memory leaks in the code now. Avoid printing multiple dots. Split code into smaller functions. Uses no exception handling. Signed-off-by: Eric Curtin <[email protected]>

The vulkan-shaders-gen was not parsing the --no-clean argument correctly. Because the previous code was parsing the arguments which have a value only and the --no-clean argument does not have a value, it was not being parsed correctly. This commit can now correctly parse arguments that don't have values.

Co-authored-by: noemotiovon <[email protected]>

* improve inferencing performance for ascend npu. Co-authored-by: Frank Mai <thxCode@[email protected]> * some modification after review * some modifications after review * restore some modifications * restore some modifications --------- Co-authored-by: shanshan shen <[email protected]> Co-authored-by: Frank Mai <thxCode@[email protected]>

ggml-ci

* ggml-cpu: cmake add arm64 cpu feature check for macos * use vmmlaq_s32 for compile option i8mm check

* cmake : enable warnings in llama ggml-ci * cmake : add llama_get_flags and respect LLAMA_FATAL_WARNINGS * cmake : get_flags -> ggml_get_flags * speculative-simple : fix warnings * cmake : reuse ggml_get_flags ggml-ci * speculative-simple : fix compile warning ggml-ci

Co-authored-by: arthw <[email protected]>

* server : replace behave with pytest * fix test on windows * misc * add more tests * more tests * styling * log less, fix embd test * added all sequential tests * fix coding style * fix save slot test * add parallel completion test * fix parallel test * remove feature files * update test docs * no cache_prompt for some tests * add test_cache_vs_nocache_prompt

Fix bad calculation of the end of the range. Add a backend test that covers the bad case (taken from stable diffusion). Fixes leejet/stable-diffusion.cpp#439.

…10516) Signed-off-by: Xiaodong Ye <[email protected]>

* Fix inconsistency of HIP flags in cmake & make * Fix docs regarding GGML_HIP

* Add link to OLMo 2 model in docs * Change link to landing page

There have been reports of failure to compile on systems with <= 32KB of shared memory (e.g. #10037). This change makes the large tile size fall back to a smaller size if necessary, and makes mul_mat_id fall back to CPU if there's only 16KB of shared memory.

* ci : faster CUDA toolkit installation method and use ccache * remove fetch-depth * only pack CUDA runtime on master

* Add some minimal optimizations for CDNA * ggml_cuda: set launch bounds also for GCN as it helps there too

slaren and others added 2 commits November 25, 2024 22:05

ci : build docker images only once daily (#10503)

50d5cec

Introduce llama-run (#10291)

0cc6375

It's like simple-chat but it uses smart pointers to avoid manual memory cleanups. Less memory leaks in the code now. Avoid printing multiple dots. Split code into smaller functions. Uses no exception handling. Signed-off-by: Eric Curtin <[email protected]>

pull bot added the ⤵️ pull label Nov 25, 2024

github-actions bot added examples devops build labels Nov 25, 2024

sparkleholic and others added 5 commits November 26, 2024 01:47

CANN: RoPE and CANCAT operator optimization (#10488)

7066b4c

Co-authored-by: noemotiovon <[email protected]>

speculative : simplify the implementation (#10504)

811872a

ggml-ci

server : fix parallel speculative decoding (#10513)

84e1c33

ggml-ci

github-actions bot added the server label Nov 26, 2024

chaxu01 and others added 3 commits November 26, 2024 13:37

ggml-cpu: cmake add arm64 cpu feature check for macos (#10487)

25669aa

* ggml-cpu: cmake add arm64 cpu feature check for macos * use vmmlaq_s32 for compile option i8mm check

ci : add ubuntu cuda build, build with one arch on windows (#10456)

c6807b3

ci : publish the docker images created during scheduled runs (#10515)

7db3846

github-actions bot added nix ggml Vulkan labels Nov 26, 2024

github-actions bot added the Nvidia GPU label Nov 26, 2024

NeoZhangJianyu and others added 2 commits November 26, 2024 21:43

restore the condistion to build & update pacakge when merge (#10507)

0bbd226

Co-authored-by: arthw <[email protected]>

github-actions bot added the python label Nov 26, 2024

vulkan: fix group_norm (#10496)

904109e

Fix bad calculation of the end of the range. Add a backend test that covers the bad case (taken from stable diffusion). Fixes leejet/stable-diffusion.cpp#439.

github-actions bot added the testing label Nov 26, 2024

yeahdongcn and others added 2 commits November 26, 2024 17:00

mtgpu: Add MUSA_DOCKER_ARCH in Dockerfiles && update cmake and make (#…

249cd93

…10516) Signed-off-by: Xiaodong Ye <[email protected]>

Fix HIP flag inconsistency & build docs (#10524)

be0e350

* Fix inconsistency of HIP flags in cmake & make * Fix docs regarding GGML_HIP

github-actions bot added the documentation Improvements or additions to documentation label Nov 26, 2024

slaren added 2 commits November 26, 2024 21:01

llama : disable warnings for 3rd party sha1 dependency (#10527)

30ec398

ci : remove nix workflows (#10526)

5a349f2

2015aroras and others added 9 commits November 26, 2024 21:55

Add OLMo 2 model in docs (#10530)

de50973

* Add link to OLMo 2 model in docs * Change link to landing page

ci : fix cuda releases (#10532)

c9b00a7

vulkan: optimize Q2_K and Q3_K mul_mat_vec (#10459)

4a57d36

vulkan: skip integer div/mod in get_offsets for batch_idx==0 (#10506)

71a6498

vulkan: further optimize q5_k mul_mat_vec (#10479)

249a790

vulkan: define all quant data structures in types.comp (#10440)

c31ed2a

Do not include arm_neon.h when compiling CUDA code (ggml/1028)

9150f8f

sync : ggml

fee824a

github-actions bot added the script label Nov 27, 2024

metal : fix group_norm support condition (#0)

9e2301f

github-actions bot added the Apple Metal label Nov 27, 2024

slaren and others added 2 commits November 27, 2024 11:03

ci : faster CUDA toolkit installation method and use ccache (#10537)

46c69e0

* ci : faster CUDA toolkit installation method and use ccache * remove fetch-depth * only pack CUDA runtime on master

Add some minimal optimizations for CDNA (#10498)

3ad5451

* Add some minimal optimizations for CDNA * ggml_cuda: set launch bounds also for GCN as it helps there too

teleprint-me closed this Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggerganov:master #155

[pull] master from ggerganov:master #155

pull bot commented Nov 25, 2024 •

edited

Loading

[pull] master from ggerganov:master #155

[pull] master from ggerganov:master #155

Conversation

pull bot commented Nov 25, 2024 • edited Loading

pull bot commented Nov 25, 2024 •

edited

Loading