Releases · teleprint-me/llama.cpp

23 May 04:38

cd93a28

b2972

CUDA: fix FA out-of-bounds reads (#7479)

Assets 21

22 May 18:09

github-actions

b2970

197ff91

b2970

build : remove zig (#7471)

Assets 21

21 May 21:56

github-actions

b2961

201cc11

b2961

llama : add phi3 128K model support (#7225)

* add phi3 128k support in convert-hf-to-gguf

* add phi3 128k support in cuda

* address build warnings on llama.cpp

* adjust index value in cuda long rope freq factors

* add long rope support in ggml cpu backend

* make freq factors only depend on ctx size

* remove unused rope scaling type 'su' frin gguf converter

* fix flint warnings on convert-hf-to-gguf.py

* set to the short freq factor when context size is small than trained context size

* add one line of comments

* metal : support rope freq_factors

* ggml : update ggml_rope_ext API to support freq. factors

* backends : add dev messages to support rope freq. factors

* minor : style

* tests : update to use new rope API

* backends : fix pragma semicolons

* minor : cleanup

* llama : move rope factors from KV header to tensors

* llama : remove tmp assert

* cuda : fix compile warning

* convert : read/write n_head_kv

* llama : fix uninitialized tensors

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 21

21 May 18:54

github-actions

b2958

fcf6538

b2958

CUDA: fix unused warning in mmq.cu (#7442)

Assets 21

20 May 19:25

github-actions

b2953

917dc8c

b2953

Tokenizer SPM fixes for phi-3 and llama-spm (#7375)

* Update brute force test: special tokens
* Fix added tokens
  - Try to read 'added_tokens.json'.
  - Try to read 'tokenizer_config.json'.
  - Try to read 'tokenizer.json'.
* Fix special tokens rtrim

Co-authored-by: Georgi Gerganov <[email protected]>
* server : fix test regexes

Assets 21

20 May 04:00

github-actions

b2941

33c8d50

b2941

Add provisions for windows support for BF16 code including CMake prov…

Assets 21

19 May 20:31

github-actions

b2939

1ea2a00

b2939

quantize : fix --keep-split check (#7374)

Assets 21

19 May 03:06

github-actions

b2929

f5bf761

b2929

Capture CUDA logging output (#7298)

* logging: output capture in cuda module

* fix compile error

* fix: vsnprintf terminates with 0, string use not correct

* post review

* Update llama.cpp

Co-authored-by: slaren <[email protected]>

* Update llama.cpp

Co-authored-by: slaren <[email protected]>

---------

Co-authored-by: slaren <[email protected]>

Assets 21

17 May 23:50

github-actions

b2916

b43272a

b2916

Unicode codepoint flags for custom regexs (#7245)

* Replace CODEPOINT_TYPE_* with codepoint_flags
* Update and bugfix brute force random test
* Deterministic brute force random test
* Unicode normalization NFD
* Get rid of BOM

Assets 21

17 May 22:35

github-actions

b2915

0fc1e82

b2915

CUDA: faster large batch FA without tensor cores (#7314)

Assets 21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: teleprint-me/llama.cpp

b2972

b2970

b2961

b2958

b2953

b2941

b2939

b2929

b2916

b2915