Skip to content

b1663

Compare
Choose a tag to compare
@github-actions github-actions released this 20 Dec 17:48
799fc22
CUDA: Faster Mixtral prompt processing (#4538)

* CUDA: make MoE tensors contiguous for batch size>1

* Update ggml-cuda.cu

Co-authored-by: slaren <[email protected]>

---------

Co-authored-by: slaren <[email protected]>