Releases · ngxson/llama.cpp

26 Dec 16:31

d79d8f3

b4393 Latest

Latest

vulkan: multi-row k quants (#10846)

* multi row k quant shaders!

* better row selection

* more row choices

* readjust row selection

* rm_kq=2 by default

Assets 23

cudart-llama-bin-win-cu11.7-x64.zip

303 MB 2024-12-26T16:31:50Z
cudart-llama-bin-win-cu12.4-x64.zip

373 MB 2024-12-26T16:31:57Z
llama-b4393-bin-macos-arm64.zip

60.2 MB 2024-12-26T16:32:03Z
llama-b4393-bin-macos-x64.zip

61 MB 2024-12-26T16:32:04Z
llama-b4393-bin-ubuntu-x64.zip

66 MB 2024-12-26T16:32:06Z
llama-b4393-bin-win-avx-x64.zip

9.76 MB 2024-12-26T16:32:08Z
llama-b4393-bin-win-avx2-x64.zip

9.76 MB 2024-12-26T16:32:08Z
llama-b4393-bin-win-avx512-x64.zip

9.78 MB 2024-12-26T16:32:09Z
llama-b4393-bin-win-cuda-cu11.7-x64.zip

147 MB 2024-12-26T16:32:10Z
llama-b4393-bin-win-cuda-cu12.4-x64.zip

147 MB 2024-12-26T16:32:13Z
Source code (zip)

2024-12-26T15:54:44Z
Source code (tar.gz)

2024-12-26T15:54:44Z

26 Dec 14:42

github-actions

b4392

d283d02

b4392

examples, ggml : fix GCC compiler warnings (#10983)

Warning types fixed (observed under MSYS2 GCC 14.2.0):
* format '%ld' expects argument of type 'long int', but argument has type 'size_t'
* llama.cpp/ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp:81:46: warning: missing initializer for member '_STARTUPINFOA::lpDesktop' [-Wmissing-field-initializers]  (emitted for all struct field except first)

Assets 23

24 Dec 21:15

github-actions

b4391

9ba399d

b4391

server : add support for "encoding_format": "base64" to the */embeddi…

Assets 23

24 Dec 18:33

github-actions

b4390

2cd43f4

b4390

ggml : more perfo with llamafile tinyblas on x86_64 (#10714)

* more perfo with llamafile tinyblas on x86_64.

- add bf16 suport
- change dispache strategie (thanks:
https://github.com/ikawrakow/ik_llama.cpp/pull/71 )
- reduce memory bandwidth

simple tinyblas dispache and more cache freindly

* tinyblas dynamic dispaching

* sgemm: add M blocs.

* - git 2.47 use short id of len 9.
- show-progress is not part of GNU Wget2

* remove not stable test

Assets 23

24 Dec 17:19

github-actions

b4389

09fe2e7

b4389

server:  allow filtering llama server response fields (#10940)

* llama_server_response_fields

* llama_server_response_fields_fix_issues

* params fixes

* fix

* clarify docs

* change to "response_fields"

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>

Assets 23

24 Dec 08:22

github-actions

b4388

30caac3

b4388

llama : the WPM vocabs use the CLS token as BOS (#10930)

* llama : the WPM vocabs use the CLS token as BOS

ggml-ci

* llama : add comment

Assets 23

24 Dec 03:47

github-actions

b4387

60cfa72

b4387

ggml : use wstring for backend search paths (#10960)

ggml-ci

Assets 23

23 Dec 12:31

github-actions

b4384

14b699e

b4384

server : fix missing model id in /model endpoint (#10957)

* server : fix missing model id in /model endpoint

* fix ci

Assets 23

23 Dec 10:08

github-actions

b4382

86bf31c

b4382

rpc-server : add support for the SYCL backend (#10934)

Assets 23

23 Dec 02:13

github-actions

b4381

b92a14a

b4381

llama : support InfiniAI Megrez 3b (#10893)

* Support InfiniAI Megrez 3b

* Fix tokenizer_clean_spaces for megrez

Assets 23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ngxson/llama.cpp

b4393

b4392

b4391

b4390

b4389

b4388

b4387

b4384

b4382

b4381