Releases: teleprint-me/llama.cpp
Releases · teleprint-me/llama.cpp
b1663
CUDA: Faster Mixtral prompt processing (#4538) * CUDA: make MoE tensors contiguous for batch size>1 * Update ggml-cuda.cu Co-authored-by: slaren <[email protected]> --------- Co-authored-by: slaren <[email protected]>
b1662
ggml : fixed check for _MSC_VER (#4535) Co-authored-by: Eric Sommerlade <[email protected]>
b1661
ggml-cuda: Fix HIP build (#4528) regression of #4490 Adds defines for two new datatypes cublasComputeType_t, cudaDataType_t. Currently using deprecated hipblasDatatype_t since newer ones very recent.
b1657
llama : fix try_override for bool_value which always return true (#4519)
b1656
decode : fix logits_valid for legacy API (#4516)
b1644
Merge pull request #2 from ggerganov/master [pull] master from ggerganov:master
b1641
Merge branch 'ggerganov:master' into master
b1640
ggml : remove n_dims from ggml_tensor (#4469) ggml-ci
b1635
Merge pull request #1 from ggerganov/master [pull] master from ggerganov:master