Releases: teleprint-me/llama.cpp
Releases · teleprint-me/llama.cpp
b2893
Avoid unnecessarily disabling CUDA graphs (#7302) As discussed in PR #6766, CUDA graphs were being disabled in the presence of long prompts. This fixes the issue by avoiding the consective update counter from incrementing unnecessarily for tokens in which cuda graphs are disabled due to batch size > 1.
b2886
script : sync ggml-rpc
b2871
llama : less KV padding when FA is off (#7257) ggml-ci
b2864
[SYCL] Add oneapi runtime dll files to win release package (#7241) * add oneapi running time dlls to release package * fix path * fix path * fix path * fix path * fix path --------- Co-authored-by: Zhang <[email protected]>
b2862
CUDA: add FP32 FlashAttention vector kernel (#7188) * CUDA: add FP32 FlashAttention vector kernel * fixup! CUDA: add FP32 FlashAttention vector kernel * fixup! fixup! CUDA: add FP32 FlashAttention vector kernel * fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel
b2861
cmake : fix version cmp (#7227)
b2859
metal : fix warnings (skipme) (#0)
b2854
fix system prompt handling (#7153)
b2843
llama-bench : add pp+tg test type (#7199)
b2836
Minor arithmetic improvement to mmvq wrapper kernel (#7172)