llamafile : ppc64le MMA INT8 implementation (#10912) · ggerganov/llama.cpp@8cef75c · GitHub

Commit

llamafile : ppc64le MMA INT8 implementation (#10912)

Browse files

This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le using MMA
builtins for quantised int8 datatype.

This change results in 10% - 70% improvement
in total speed(ie all tokens/total time), across
various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Signed-off-by: Amrita H S <[email protected]>

Loading branch information

amritahs-ibm authored Jan 8, 2025

1 parent 0d52a69 commit 8cef75c