Skip to content
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.

3bit weight infrastructure in BesTLA #125

Merged
merged 33 commits into from
Feb 20, 2024
Merged

3bit weight infrastructure in BesTLA #125

merged 33 commits into from
Feb 20, 2024

Conversation

zhewang1-intc
Copy link
Contributor

Type of Change

feature or bug fix or documentation or others: feature
API changed or not: yes
support S3_CLIP quant-type in BesTLA
supported ISA: AVX512F/AVX512_VNNI/AMX_INT8/AMX_BF16
note: not support most of client CPUs without bit_mask instructions.
usage:
auto ptr = kernel.createStorage(n, k, blocksize, BTLA_DTYPE::S3_CLIP, BTLA_DTYPE::F32, BTLA_DTYPE::F32, false);

Description

nearly 17% next-token perf improvement compared with int4(gptj model, gs=128, int8-cmpt)

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information): tested on Intel Xeon 8480+ & Intel Core 1185G7

Dependency Change?

any library dependency introduced or removed: No

@airMeng airMeng merged commit ee40f28 into main Feb 20, 2024
11 checks passed
@airMeng airMeng deleted the 3bit_wei_2buf branch February 20, 2024 08:01
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants