Skip to content

Commit

Permalink
llamafile : ppc64le MMA INT8 implementation (#10912)
Browse files Browse the repository at this point in the history
This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le using MMA
builtins for quantised int8 datatype.

This change results in 10% - 70% improvement
in total speed(ie all tokens/total time), across
various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Signed-off-by: Amrita H S <[email protected]>
  • Loading branch information
amritahs-ibm authored Jan 8, 2025
1 parent 0d52a69 commit 8cef75c
Showing 1 changed file with 773 additions and 69 deletions.
Loading

0 comments on commit 8cef75c

Please sign in to comment.