Releases · teleprint-me/llama.cpp

30 Dec 23:54

9fbda71

b1731

clip : refactor + bug fixes (#4696)

* clip : refactor + bug fixes

ggml-ci

* server : add log message

Assets 12

27 Dec 18:23

github-actions

b1708

f679349

b1708

llama : add AWQ for llama, llama2, mpt, and mistral models (#4593)

* update: awq support llama-7b model

* update: change order

* update: benchmark results for llama2-7b

* update: mistral 7b v1 benchmark

* update: support 4 models

* fix: Readme

* update: ready for PR

* update: readme

* fix: readme

* update: change order import

* black

* format code

* update: work for bot mpt and awqmpt

* update: readme

* Rename to llm_build_ffn_mpt_awq

* Formatted other files

* Fixed params count

* fix: remove code

* update: more detail for mpt

* fix: readme

* fix: readme

* update: change folder architecture

* fix: common.cpp

* fix: readme

* fix: remove ggml_repeat

* update: cicd

* update: cicd

* uppdate: remove use_awq arg

* update: readme

* llama : adapt plamo to new ffn

ggml-ci

---------

Co-authored-by: Trần Đức Nam <[email protected]>
Co-authored-by: Le Hoang Anh <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>

Assets 12

27 Dec 15:35

github-actions

b1707

879b690

b1707

finetune : fix output formatting in print_params (#4653)

This commit fixes the output formatting in the print_params function
which currently looks like this:
```console
print_params: n_vocab:   32000
print_params: n_ctx:     128
print_params: n_embd:    4096
print_params: n_ff:      11008
print_params: n_head:    32
print_params: n_head_kv: 32
print_params: n_layer:   32
print_params: norm_rms_eps          : 0.000010
print_params: rope_freq_base        : 10000.000000
print_params: rope_freq_scale       : 1.000000
```
With this comit the output will look like this:
```console
print_params: n_vocab               : 32000
print_params: n_ctx                 : 128
print_params: n_embd                : 4096
print_params: n_ff                  : 11008
print_params: n_head                : 32
print_params: n_head_kv             : 32
print_params: n_layer               : 32
print_params: norm_rms_eps          : 0.000010
print_params: rope_freq_base        : 10000.000000
print_params: rope_freq_scale       : 1.000000
```

Signed-off-by: Daniel Bevenius <[email protected]>

Assets 12

27 Dec 00:53

github-actions

b1703

dc68f00

b1703

cuda : fix vmm pool with multi GPU (#4620)

* cuda : fix vmm pool with multi GPU

* hip

* use recommended granularity instead of minimum

* better error checking

* fix mixtral

* use cudaMemcpy3DPeerAsync

* use cuda_pool_alloc in ggml_cuda_op_mul_mat

* consolidate error checking in ggml_cuda_set_device

* remove unnecessary inlines

ggml-ci

* style fixes

* only use vmm for the main device

* fix scratch buffer size, re-enable vmm pool for all devices

* remove unnecessary check id != g_main_device

Assets 12

26 Dec 18:14

github-actions

b1702

de8e496

b1702

Update comment for AdamW implementation reference. (#4604)

Co-authored-by: Will Findley <[email protected]>

Assets 12

24 Dec 20:40

github-actions

b1699

b9f4795

b1699

simplify bug issue template (#4623)

Assets 12

23 Dec 16:43

github-actions

b1696

708e179

b1696

fallback to CPU buffer if host buffer alloc fails (#4610)

Assets 12

23 Dec 12:34

github-actions

b1695

925e558

b1695

ci(docker): fix tags in "Build and push docker image (tagged)" (#4603)

Assets 12

22 Dec 20:20

github-actions

b1691

7082d24

b1691

lookup : add prompt lookup decoding example (#4484)

* initial commit, going through initializations

* main loop finished, starting to debug

* BUG: generates gibberish/repeating tokens after a while

* kv_cache management

* Added colors to distinguish drafted tokens (--color). Updated README

* lookup : fix token positions in the draft batch

* lookup : use n_draft from CLI params

* lookup : final touches

---------

Co-authored-by: Leon Ericsson <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>

Assets 12

21 Dec 23:09

github-actions

b1680

afefa31

b1680

ggml : change ggml_scale to take a float instead of tensor (#4573)

* ggml : change ggml_scale to take a float instead of tensor

* ggml : fix CPU implementation

* tests : fix test-grad0

ggml-ci

Assets 12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: teleprint-me/llama.cpp

b1731

b1708

b1707

b1703

b1702

b1699

b1696

b1695

b1691

b1680