Releases: teleprint-me/llama.cpp
Releases · teleprint-me/llama.cpp
b3164
[SYCL] Update README-sycl.md for Chapter "Recommended release" and "N…
b3159
flake.lock: Update (#7951)
b3154
Vulkan Shader Refactor, Memory Debugging Option (#7947) * Refactor shaders, extract GLSL code from ggml_vk_generate_shaders.py into vulkan-shaders directory * Improve debug log code * Add memory debug output option * Fix flake8 * Fix unnecessary high llama-3 VRAM use
b3151
ci : fix macos x86 build (#7940) In order to use old `macos-latest` we should use `macos-12` Potentially will fix: https://github.com/ggerganov/llama.cpp/issues/6975
b3150
CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (#7921) * CUDA: faster q2_K, q3_K MMQ + int8 tensor cores * try CI fix * try CI fix * try CI fix * fix data race * rever q2_K precision related changes
b3149
metal : utilize max shared memory for mul_mat_id (#7935)
b3141
CUDA: fix broken oob check for FA vec f32 kernel (#7904)
b3092
CUDA: refactor mmq, dmmv, mmvq (#7716) * CUDA: refactor mmq, dmmv, mmvq * fix out-of-bounds write * struct for qk, qr, qi * fix cmake build * mmq_type_traits
b3084
readme : remove obsolete Zig instructions (#7471)
b3078
llama : offload to RPC in addition to other backends (#7640) * llama : offload to RPC in addition to other backends * - fix copy_tensor being called on the src buffer instead of the dst buffer - always initialize views in the view_src buffer - add RPC backend to Makefile build - add endpoint to all RPC object names * add rpc-server to Makefile * Update llama.cpp Co-authored-by: slaren <[email protected]> --------- Co-authored-by: slaren <[email protected]>