Skip to content

Releases: teleprint-me/llama.cpp

b1663

20 Dec 17:48
799fc22
Compare
Choose a tag to compare
CUDA: Faster Mixtral prompt processing (#4538)

* CUDA: make MoE tensors contiguous for batch size>1

* Update ggml-cuda.cu

Co-authored-by: slaren <[email protected]>

---------

Co-authored-by: slaren <[email protected]>

b1662

19 Dec 19:00
328b83d
Compare
Choose a tag to compare
ggml : fixed check for _MSC_VER (#4535)

Co-authored-by: Eric Sommerlade <[email protected]>

b1661

19 Dec 11:51
a7aee47
Compare
Choose a tag to compare
ggml-cuda: Fix HIP build (#4528)

regression of #4490
Adds defines for two new datatypes
cublasComputeType_t, cudaDataType_t.

Currently using deprecated hipblasDatatype_t since newer ones very recent.

b1657

18 Dec 16:57
3c04bf6
Compare
Choose a tag to compare
llama : fix try_override for bool_value which always return true (#4519)

b1656

18 Dec 09:11
2994f0c
Compare
Choose a tag to compare
decode : fix logits_valid for legacy API (#4516)

b1644

15 Dec 22:41
601071c
Compare
Choose a tag to compare
Merge pull request #2 from ggerganov/master

[pull] master from ggerganov:master

b1641

14 Dec 16:15
9db3b64
Compare
Choose a tag to compare
Merge branch 'ggerganov:master' into master

b1640

14 Dec 17:04
cafcd4f
Compare
Choose a tag to compare
ggml : remove n_dims from ggml_tensor (#4469)

ggml-ci

b1635

14 Dec 06:21
d135aec
Compare
Choose a tag to compare
Merge pull request #1 from ggerganov/master

[pull] master from ggerganov:master