Skip to content

Releases: teleprint-me/llama.cpp

b2893

15 May 17:30
dc02098
Compare
Choose a tag to compare
Avoid unnecessarily disabling CUDA graphs (#7302)

As discussed in PR #6766, CUDA graphs were being disabled in the presence of long prompts.
This fixes the issue by avoiding the consective update counter from incrementing unnecessarily
for tokens in which cuda graphs are disabled due to batch size > 1.

b2886

15 May 01:29
9f77348
Compare
Choose a tag to compare
script : sync ggml-rpc

b2871

13 May 18:20
614d3b9
Compare
Choose a tag to compare
llama : less KV padding when FA is off (#7257)

ggml-ci

b2864

13 May 01:40
cbf7589
Compare
Choose a tag to compare
[SYCL] Add oneapi runtime dll files to win release package (#7241)

* add oneapi running time dlls to release package

* fix path

* fix path

* fix path

* fix path

* fix path

---------

Co-authored-by: Zhang <[email protected]>

b2862

12 May 19:20
dc685be
Compare
Choose a tag to compare
CUDA: add FP32 FlashAttention vector kernel (#7188)

* CUDA: add FP32 FlashAttention vector kernel

* fixup! CUDA: add FP32 FlashAttention vector kernel

* fixup! fixup! CUDA: add FP32 FlashAttention vector kernel

* fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel

b2861

12 May 17:38
6f1b636
Compare
Choose a tag to compare
cmake : fix version cmp (#7227)

b2859

11 May 21:41
7bd4ffb
Compare
Choose a tag to compare
metal : fix warnings (skipme) (#0)

b2854

11 May 16:24
72c177c
Compare
Choose a tag to compare
fix system prompt handling (#7153)

b2843

10 May 18:40
e849648
Compare
Choose a tag to compare
llama-bench : add pp+tg test type (#7199)

b2836

10 May 02:18
8c570c9
Compare
Choose a tag to compare
Minor arithmetic improvement to mmvq wrapper kernel (#7172)