Releases: teleprint-me/llama.cpp
Releases · teleprint-me/llama.cpp
b2234
llama : fix loading models with shared tok_embd and output (#5651) ggml-ci
b2230
examples : do not assume BOS when shifting context (#5622)
b2217
Server: use llama_chat_apply_template (#5593) * server: use llama_chat_apply_template * server: remove trailing space * server: fix format_chat * server: fix help message Co-authored-by: Georgi Gerganov <[email protected]> * server: fix formatted_chat --------- Co-authored-by: Georgi Gerganov <[email protected]>
b2181
server : slots monitoring endpoint (#5550)
b2167
cmake : fix VULKAN and ROCm builds (#5525) * cmake : fix VULKAN and ROCm builds * cmake : fix (cont) * vulkan : fix compile warnings ggml-ci * cmake : fix ggml-ci * cmake : minor ggml-ci
b2134
llama : fix quantization when tensors are missing (#5423)
b2128
CUDA: mul_mat_vec_q tiling, refactor mul mat logic (#5434) * CUDA: mul_mat_vec_q tiling, refactor mul mat logic Co-authored-by: slaren <[email protected]> --------- Co-authored-by: slaren <[email protected]>
b2116
metal : use autoreleasepool to avoid memory leaks (#5437) There appears to be a known memory leak when using the `MLTCommandBuffer`. It is suggested to use `@autoreleasepool` in [1,2] [1] https://developer.apple.com/forums/thread/662721 [2] https://forums.developer.apple.com/forums/thread/120931 This change-set wraps the `ggml_metal_graph_compute` in a `@autoreleasepool`. This commit addresses https://github.com/ggerganov/llama.cpp/issues/5436
b2112
vulkan: Set limit for task concurrency (#5427) A common default for the maximum number of open files is 256, which can lead to `asyncio.gather(*tasks)` failing with Too many open files. $ python ggml_vk_generate_shaders.py --glslc=$ANDROID_NDK_PATH/shader-tools/darwin-x86_64/glslc ggml_vulkan: Generating and compiling shaders to SPIR-V Traceback (most recent call last): File "/Users/neuman/Code.noindex/github/llama.cpp/ggml_vk_generate_shaders.py", line 2326, in <module> asyncio.run(main()) File "/Users/neuman/Code.noindex/miniforge3/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/Users/neuman/Code.noindex/miniforge3/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result() File "/Users/neuman/Code.noindex/github/llama.cpp/ggml_vk_generate_shaders.py", line 2294, in main await asyncio.gather(*tasks) [...snip...] OSError: [Errno 24] Too many open files This change sets a reasonable concurrency limit for tasks (and therefore open files), without significant impact on run time.
b2103
Fix f16_sycl cpy call from Arc (#5411) * fix f16_sycl cpy call * rm old logic * add fp16 build CI * use macro * format fix