Releases · teleprint-me/llama.cpp

15 May 17:30

dc02098

b2893

Avoid unnecessarily disabling CUDA graphs (#7302)

As discussed in PR #6766, CUDA graphs were being disabled in the presence of long prompts.
This fixes the issue by avoiding the consective update counter from incrementing unnecessarily
for tokens in which cuda graphs are disabled due to batch size > 1.

Assets 20

15 May 01:29

github-actions

b2886

9f77348

b2886

script : sync ggml-rpc

Assets 20

13 May 18:20

github-actions

b2871

614d3b9

b2871

llama : less KV padding when FA is off (#7257)

ggml-ci

Assets 19

13 May 01:40

github-actions

b2864

cbf7589

b2864

[SYCL] Add oneapi runtime dll files to win release package (#7241)

* add oneapi running time dlls to release package

* fix path

* fix path

* fix path

* fix path

* fix path

---------

Co-authored-by: Zhang <[email protected]>

Assets 19

12 May 19:20

github-actions

b2862

dc685be

b2862

CUDA: add FP32 FlashAttention vector kernel (#7188)

* CUDA: add FP32 FlashAttention vector kernel

* fixup! CUDA: add FP32 FlashAttention vector kernel

* fixup! fixup! CUDA: add FP32 FlashAttention vector kernel

* fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel

Assets 19

12 May 17:38

github-actions

b2861

6f1b636

b2861

cmake : fix version cmp (#7227)

Assets 19

11 May 21:41

github-actions

b2859

7bd4ffb

b2859

metal : fix warnings (skipme) (#0)

Assets 19

11 May 16:24

github-actions

b2854

72c177c

b2854

fix system prompt handling (#7153)

Assets 19

10 May 18:40

github-actions

b2843

e849648

b2843

llama-bench : add pp+tg test type (#7199)

Assets 19

10 May 02:18

github-actions

b2836

8c570c9

b2836

Minor arithmetic improvement to mmvq wrapper kernel (#7172)

Assets 19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: teleprint-me/llama.cpp

b2893

b2886

b2871

b2864

b2862

b2861

b2859

b2854

b2843

b2836