init commit #5

airMeng · 2023-12-20T05:55:50Z

Type of Change

feature or bug fix or documentation or others
API changed or not

Description

detail description
Issues: xxx

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

Signed-off-by: zhenwei-intel <[email protected]>

* move hidden files Signed-off-by: zhenwei-intel <[email protected]> * update readme path Signed-off-by: zhenwei-intel <[email protected]> --------- Signed-off-by: zhenwei-intel <[email protected]>

* Unify scripts for converting, quantizing and chatting Signed-off-by: zhenwei-intel <[email protected]> * move folder * update script with subprocess Signed-off-by: zhenwei-intel <[email protected]> --------- Signed-off-by: zhenwei-intel <[email protected]>

* initial commit of n_head_kv in MQA Signed-off-by: Yu, Zhentao <[email protected]> * add attn ln Signed-off-by: Yu, Zhentao <[email protected]> * reorder QKV weight when convert Signed-off-by: Yu, Zhentao <[email protected]> * fix typo Signed-off-by: Yu, Zhentao <[email protected]> * cherry-pick ggml MQA Signed-off-by: Yu, Zhentao <[email protected]> * fix kv cache and reduce handmade mem buffer size Signed-off-by: Yu, Zhentao <[email protected]> --------- Signed-off-by: Yu, Zhentao <[email protected]>

* Update README.md Update the readme * Update README.md * Update README.md * Update README.md

* Refine Inference Workflow Readme --------- Signed-off-by: hshen14 <[email protected]> Co-authored-by: lvliang-intel <[email protected]> Co-authored-by: Wang, Chang <[email protected]>

* add s8 perchannel quant and kernel. * add QKV , add fusion support for s8 PerN * add amx_int8 pern gelu fusion * add gelu add fusion for vnni * split jblas file. add compute type fp32. * add comp_type fp32 for ffn fusion * add bf16 for s4 and s4 ffn fusion * add workspace for jblas functions * keep one jblas code * disable mmap as default. change arg --no_mmap to --use_mmap.

* refine reademe * refine reademe * refine table * Refine LLM Runtime readme Signed-off-by: hshen14 <[email protected]> * Continue updating the readme Signed-off-by: hshen14 <[email protected]> * Simplify the readme Signed-off-by: hshen14 <[email protected]> * add back run_llm.py * change script arg name * rename arg * fix * add description * add another way to convert model * remove additional line * refine readme * refine readme, but we need to modify convert script later * fix model_maps Signed-off-by: zhenwei-intel <[email protected]> * fix convert_gptj Signed-off-by: zhenwei-intel <[email protected]> * refine readme * refine --------- Signed-off-by: hshen14 <[email protected]> Signed-off-by: zhenwei-intel <[email protected]> Co-authored-by: hshen14 <[email protected]> Co-authored-by: zhenwei-intel <[email protected]>

Signed-off-by: Dong, Bo1 <[email protected]>

* support bloom Signed-off-by: Dong, Bo1 <[email protected]>

* add length_penalty and min_new_tokens_logits_process Signed-off-by: Yu, Zhentao <[email protected]> * revert V cache reorder Signed-off-by: Yu, Zhentao <[email protected]> * refact beam_search codes arch Signed-off-by: Yu, Zhentao <[email protected]> * fix n_threads Signed-off-by: Yu, Zhentao <[email protected]> * make beam_kv_cache_reorder as a class Signed-off-by: Yu, Zhentao <[email protected]> * clean code Signed-off-by: Yu, Zhentao <[email protected]> --------- Signed-off-by: Yu, Zhentao <[email protected]> Co-authored-by: Haihao Shen <[email protected]>

* fix q8 pern QKV fusion of vnni * add silu jit kernel. add silu fusion. * fix the result of llama silu fusion * enable jit swish for higher performance

* rename llm chat application Signed-off-by: Yu, Zhentao <[email protected]> * rename CI test script Signed-off-by: Yu, Zhentao <[email protected]> --------- Signed-off-by: Yu, Zhentao <[email protected]> Co-authored-by: Dong, Bo <[email protected]>

Signed-off-by: zhenwei-intel <[email protected]>

* update jblas to b3c75b2 * mha refatctor changes * full fp16 mha draft * support fp32fp16fp16fp32 jblas mha with fp16 kernels * add fp16 mha fusion * fix the issue of fp16 on low gcc versions * keep the same permute for bf16 and fp16 MHA * fix param for fp16 MHA * mha amxbf16 supports reo-k * prepare fwd args for int8 inference * int8 mha draft * draft of bf16 mha with kv-update * disable fp16mha by default * fix mha nan * fall back to bf16 when unsupported * check mha support * update swish alpha value * fix fp32 silu bug * disable mha on compilers without bf16 intrinsics --------- Signed-off-by: Ding, Yi1 <[email protected]> Co-authored-by: luoyu-intel <[email protected]>

Signed-off-by: intellinjun <[email protected]>

* add TP and gptj model support 1. add TP_1D algo 2. add parallel_context for broadcast/reduce 3. support all data type 4. support gptj model Signed-off-by: Clark Chin <[email protected]>

Signed-off-by: Ding, Yi1 <[email protected]>

* chatglm-2 q4_j infernece pass with correct accuracy * unift convert scripts * specify chatglm2, remove ambiguous chatglm * initilize glm1 * initilize glm1 * Fix kernel issues for glm1 * adapt to the latest main and chatglm2 infernece pass * add parameters for all convert.py Signed-off-by: Zhenzhong1 <[email protected]> * add parameters for the bloom * update README and cleancode * disable chatglm1 --------- Signed-off-by: Zhenzhong1 <[email protected]>

Signed-off-by: intellinjun <[email protected]>

Signed-off-by: Haihao Shen <[email protected]>

Signed-off-by: Dong, Bo1 <[email protected]>

…generation (#700)

Signed-off-by: Dong, Bo1 <[email protected]>

Co-authored-by: kevinintel <[email protected]>

* Baichuan13B FP32 inference bug fix

Co-authored-by: luoyu-intel <[email protected]> Co-authored-by: Ding, Yi1 <[email protected]> Co-authored-by: zhenwei-intel <[email protected]> Co-authored-by: yuchengliu1 <[email protected]> Co-authored-by: Meng, Hengyu <[email protected]>

* add fp8 in llm frontend Signed-off-by: Yu, Zhentao <[email protected]>

Co-authored-by: luoyu-intel <[email protected]>

migrate CI Signed-off-by: Hengyu Meng <[email protected]> refine CI for neuralspeed Signed-off-by: Wenxin Zhang <[email protected]> add more CI scripts Signed-off-by: Wenxin Zhang <[email protected]> minor fix Signed-off-by: Wenxin Zhang <[email protected]> remove runner.name when running on ubuntu-latest Signed-off-by: Wenxin Zhang <[email protected]> update CI to share system Signed-off-by: Wenxin Zhang <[email protected]> rename jblas tp bestla directory reorg Signed-off-by: Hengyu Meng <[email protected]> remove itrex dependency Signed-off-by: Hengyu Meng <[email protected]> fix script path\n remove python dependency Signed-off-by: Hengyu Meng <[email protected]> -s remove python tests disable percentage disable monitor Signed-off-by: Hengyu Meng <[email protected]> fix naming fix threadpool conflict Signed-off-by: Hengyu Meng <[email protected]> restore percentage Signed-off-by: Hengyu Meng <[email protected]>

add bestla workflow image Signed-off-by: Hengyu Meng <[email protected]>

Co-authored-by: Jiaxingla <[email protected]>

PenghuiCheng and others added 30 commits December 19, 2023 01:10

refact folder stracture (#170)

fd5c5d0

fix starcoder quantization bug (#159)

0a24b23

Signed-off-by: zhenwei-intel <[email protected]>

update readme path and copy hidden files (#185)

3e0fa6d

* move hidden files Signed-off-by: zhenwei-intel <[email protected]> * update readme path Signed-off-by: zhenwei-intel <[email protected]> --------- Signed-off-by: zhenwei-intel <[email protected]>

Update README.md (#198)

abaf4b7

* Update README.md Update the readme * Update README.md * Update README.md * Update README.md

Refine reademe of llm runtime (#200)

191419b

Refine Inference Workflow Readme (#214)

b1ea2c5

* Refine Inference Workflow Readme --------- Signed-off-by: hshen14 <[email protected]> Co-authored-by: lvliang-intel <[email protected]> Co-authored-by: Wang, Chang <[email protected]>

Force CMake to add --std=cxx/c++xx (#205)

4429b91

Fix graph model quantization with AVX2-only platforms (#221)

5c55657

fix scan (#224)

1013e26

Signed-off-by: Dong, Bo1 <[email protected]>

support bloom for cpp (#207)

3e12b7e

* support bloom Signed-off-by: Dong, Bo1 <[email protected]>

move neural engine to deprecated (#206)

c4157e6

[Cpp Graph]fix q8 pern QKV fusion of vnni (#230)

847609b

* fix q8 pern QKV fusion of vnni * add silu jit kernel. add silu fusion. * fix the result of llama silu fusion * enable jit swish for higher performance

[CPP Graph] add opt cpp graph and chat application (#133)

9897346

update onednn to v3.3-pc (#187)

4448dcb

Signed-off-by: zhenwei-intel <[email protected]>

[CPP Graph]Enable FFN fusion (#160)

36e92b2

fix the error of convert bloom and opt (#254)

8302078

Signed-off-by: intellinjun <[email protected]>

add TP and gptj model support (#223)

23e546f

* add TP and gptj model support 1. add TP_1D algo 2. add parallel_context for broadcast/reduce 3. support all data type 4. support gptj model Signed-off-by: Clark Chin <[email protected]>

Fix models without jblas-based kvcache support (#260)

668ed9b

Signed-off-by: Ding, Yi1 <[email protected]>

Update transformers version (#259)

34d5ddc

[CPP Graph] fix broken format (#262)

22ff2d1

add one-click script for cpp graph running (#203)

8c0691c

fix3rdparty--rebase (#239)

504effb

Zhenzhong1 and others added 25 commits December 19, 2023 01:10

[Engine] Apply the STS task to bge models (#673)

cef9f90

[LLM Runtime]fix format (#812)

b7acb2c

[LLM Runtime] fix added_tokens error (#793)

da83e34

Signed-off-by: intellinjun <[email protected]>

Update README.md

0a56fb2

Signed-off-by: Haihao Shen <[email protected]>

update (#823)

3c93710

Signed-off-by: Dong, Bo1 <[email protected]>

[Doc] update README for Qwen chat (#808)

26d7723

[LLM Runtime] ChatGLM-V1 multi-batch infer and batched greedy search …

32b38d0

…generation (#700)

[LLM Runtime] Remove use_cache in WOQ (#818)

f628e16

make void to char to avoid the unknow size (#856)

8d70347

Signed-off-by: Dong, Bo1 <[email protected]>

[Infra] enhance CI scan (#834)

cb16b86

Fix kernels softmax in int8 mha (#869)

fd909e2

Co-authored-by: kevinintel <[email protected]>

[LLM Runtime] Baichuan13B inference bug fix (#891)

6b770e0

* Baichuan13B FP32 inference bug fix

[LLM Runtime] Remove the identical branch (#894)

c19d837

[LLM Runtime] make rms_norm_eps and freq_base as parameter (#903)

fd4b6c9

[Doc] add gaudi2 in doc (#799)

c1ab776

[LLM Runtime] Add MX-Format (FP8_E5M2, FP8_E4M3, FP4_E2M1, NF4) (#872)

ac1111b

* add fp8 in llm frontend Signed-off-by: Yu, Zhentao <[email protected]>

[LLM Runtime] Fix PPL Test (#937)

6fb5f7c

[LLM Runtime] Add MatMul data types combinations table (#945)

5034a7d

[LLM Runtime] decoupling weight_type and scale_type in Qbits (#940)

0188b10

[LLM Runtime] Convert huggingface gptq model to jblas (#927)

db3b91e

Co-authored-by: luoyu-intel <[email protected]>

fix bestla typo

dbc042b

add bestla workflow image Signed-off-by: Hengyu Meng <[email protected]>

fix scripts path

c665081

fix pylint and cpplint

ea4717d

VincyZhang approved these changes Dec 20, 2023

View reviewed changes

VincyZhang force-pushed the main branch from 09ec939 to 49e4e8c Compare December 20, 2023 05:59

VincyZhang merged commit 32d9267 into main Dec 20, 2023
17 checks passed

airMeng deleted the ns_init branch December 21, 2023 02:37

DDEle pushed a commit to DDEle/neural-speed that referenced this pull request Feb 15, 2024

ocal vnni tiled col major store fix (intel#5)

9d21f39

Co-authored-by: Jiaxingla <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

init commit #5

init commit #5

airMeng commented Dec 20, 2023

init commit #5

init commit #5

Conversation

airMeng commented Dec 20, 2023

Type of Change

Description

Expected Behavior & Potential Risk

How has this PR been tested?

Dependency Change?