Releases · teleprint-me/llama.cpp

20 Dec 17:48

799fc22

b1663

CUDA: Faster Mixtral prompt processing (#4538)

* CUDA: make MoE tensors contiguous for batch size>1

* Update ggml-cuda.cu

Co-authored-by: slaren <[email protected]>

---------

Co-authored-by: slaren <[email protected]>

Assets 12

19 Dec 19:00

github-actions

b1662

328b83d

b1662

ggml : fixed check for _MSC_VER (#4535)

Co-authored-by: Eric Sommerlade <[email protected]>

Assets 12

19 Dec 11:51

github-actions

b1661

a7aee47

b1661

ggml-cuda: Fix HIP build (#4528)

regression of #4490
Adds defines for two new datatypes
cublasComputeType_t, cudaDataType_t.

Currently using deprecated hipblasDatatype_t since newer ones very recent.

Assets 12

18 Dec 16:57

github-actions

b1657

3c04bf6

b1657

llama : fix try_override for bool_value which always return true (#4519)

Assets 12

18 Dec 09:11

github-actions

b1656

2994f0c

b1656

decode : fix logits_valid for legacy API (#4516)

Assets 12

15 Dec 22:41

github-actions

b1644

601071c

b1644

Merge pull request #2 from ggerganov/master

[pull] master from ggerganov:master

Assets 12

14 Dec 16:15

github-actions

b1641

9db3b64

b1641

Merge branch 'ggerganov:master' into master

Assets 12

14 Dec 17:04

github-actions

b1640

cafcd4f

b1640

ggml : remove n_dims from ggml_tensor (#4469)

ggml-ci

Assets 12

14 Dec 06:21

github-actions

b1635

d135aec

b1635

Merge pull request #1 from ggerganov/master

[pull] master from ggerganov:master

Assets 12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: teleprint-me/llama.cpp

b1663

b1662

b1661

b1657

b1656

b1644

b1641

b1640

b1635