Skip to content
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.

init commit #5

Merged
merged 153 commits into from
Dec 20, 2023
Merged

init commit #5

merged 153 commits into from
Dec 20, 2023

Conversation

airMeng
Copy link
Contributor

@airMeng airMeng commented Dec 20, 2023

Type of Change

feature or bug fix or documentation or others
API changed or not

Description

detail description
Issues: xxx

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

PenghuiCheng and others added 30 commits December 19, 2023 01:10
* move hidden files

Signed-off-by: zhenwei-intel <[email protected]>

* update readme path

Signed-off-by: zhenwei-intel <[email protected]>

---------

Signed-off-by: zhenwei-intel <[email protected]>
* Unify scripts for converting, quantizing and chatting

Signed-off-by: zhenwei-intel <[email protected]>

* move folder

* update script with subprocess

Signed-off-by: zhenwei-intel <[email protected]>

---------

Signed-off-by: zhenwei-intel <[email protected]>
* initial commit of n_head_kv in MQA

Signed-off-by: Yu, Zhentao <[email protected]>

* add attn ln

Signed-off-by: Yu, Zhentao <[email protected]>

* reorder QKV weight when convert

Signed-off-by: Yu, Zhentao <[email protected]>

* fix typo

Signed-off-by: Yu, Zhentao <[email protected]>

* cherry-pick ggml MQA

Signed-off-by: Yu, Zhentao <[email protected]>

* fix kv cache and reduce handmade mem buffer size

Signed-off-by: Yu, Zhentao <[email protected]>

---------

Signed-off-by: Yu, Zhentao <[email protected]>
* Update README.md

Update the readme

* Update README.md

* Update README.md

* Update README.md
* Refine Inference Workflow Readme

---------

Signed-off-by: hshen14 <[email protected]>
Co-authored-by: lvliang-intel <[email protected]>
Co-authored-by: Wang, Chang <[email protected]>
* add s8 perchannel quant and kernel.

* add  QKV , add fusion support for s8 PerN

* add amx_int8 pern gelu fusion

* add gelu add fusion for vnni

* split jblas file. add compute type fp32.

* add comp_type fp32 for ffn fusion

* add bf16 for s4 and s4 ffn fusion

* add workspace for jblas functions

* keep one jblas code

* disable mmap as default. change arg --no_mmap to --use_mmap.
* refine reademe

* refine reademe

* refine table

* Refine LLM Runtime readme

Signed-off-by: hshen14 <[email protected]>

* Continue updating the readme

Signed-off-by: hshen14 <[email protected]>

* Simplify the readme

Signed-off-by: hshen14 <[email protected]>

* add back run_llm.py

* change script arg name

* rename arg

* fix

* add description

* add another way to convert model

* remove additional line

* refine readme

* refine readme, but we need to modify convert script later

* fix model_maps

Signed-off-by: zhenwei-intel <[email protected]>

* fix convert_gptj

Signed-off-by: zhenwei-intel <[email protected]>

* refine readme

* refine

---------

Signed-off-by: hshen14 <[email protected]>
Signed-off-by: zhenwei-intel <[email protected]>
Co-authored-by: hshen14 <[email protected]>
Co-authored-by: zhenwei-intel <[email protected]>
Signed-off-by: Dong, Bo1 <[email protected]>
* support bloom

Signed-off-by: Dong, Bo1 <[email protected]>
* add length_penalty and min_new_tokens_logits_process

Signed-off-by: Yu, Zhentao <[email protected]>

* revert V cache reorder

Signed-off-by: Yu, Zhentao <[email protected]>

* refact beam_search codes arch

Signed-off-by: Yu, Zhentao <[email protected]>

* fix n_threads

Signed-off-by: Yu, Zhentao <[email protected]>

* make beam_kv_cache_reorder as a class

Signed-off-by: Yu, Zhentao <[email protected]>

* clean code

Signed-off-by: Yu, Zhentao <[email protected]>

---------

Signed-off-by: Yu, Zhentao <[email protected]>
Co-authored-by: Haihao Shen <[email protected]>
* fix q8 pern QKV fusion of vnni

* add silu jit kernel. add silu fusion.

* fix the result of llama silu fusion

* enable jit swish for higher performance
* rename llm chat application

Signed-off-by: Yu, Zhentao <[email protected]>

* rename CI test script

Signed-off-by: Yu, Zhentao <[email protected]>

---------

Signed-off-by: Yu, Zhentao <[email protected]>
Co-authored-by: Dong, Bo <[email protected]>
* update jblas to b3c75b2

* mha refatctor changes

* full fp16 mha draft

* support fp32fp16fp16fp32 jblas mha with fp16 kernels

* add fp16 mha fusion

* fix the issue of fp16 on low gcc versions

* keep the same permute for bf16 and fp16 MHA

* fix param for fp16 MHA

* mha amxbf16 supports reo-k

* prepare fwd args for int8 inference

* int8 mha draft

* draft of bf16 mha with kv-update

* disable fp16mha by default

* fix mha nan

* fall back to bf16 when unsupported

* check mha support

* update swish alpha value

* fix fp32 silu bug

* disable mha on compilers without bf16 intrinsics

---------
Signed-off-by: Ding, Yi1 <[email protected]>
Co-authored-by: luoyu-intel <[email protected]>
* add TP and gptj model support
1. add TP_1D algo
2. add parallel_context for broadcast/reduce
3. support all data type
4. support gptj model

Signed-off-by: Clark Chin <[email protected]>
* chatglm-2 q4_j infernece pass with correct accuracy

* unift convert scripts

* specify chatglm2, remove ambiguous chatglm

* initilize glm1

* initilize glm1

* Fix kernel issues for glm1

* adapt to the latest main and chatglm2 infernece pass

* add parameters for all convert.py

Signed-off-by: Zhenzhong1 <[email protected]>

* add parameters for the bloom

* update README and cleancode

* disable chatglm1

---------

Signed-off-by: Zhenzhong1 <[email protected]>
Zhenzhong1 and others added 25 commits December 19, 2023 01:10
Signed-off-by: Haihao Shen <[email protected]>
Signed-off-by: Dong, Bo1 <[email protected]>
* Baichuan13B FP32 inference bug fix
Co-authored-by: luoyu-intel <[email protected]>
Co-authored-by: Ding, Yi1 <[email protected]>
Co-authored-by: zhenwei-intel <[email protected]>
Co-authored-by: yuchengliu1 <[email protected]>
Co-authored-by: Meng, Hengyu <[email protected]>
migrate CI

Signed-off-by: Hengyu Meng <[email protected]>

refine CI for neuralspeed

Signed-off-by: Wenxin Zhang <[email protected]>

add more CI scripts

Signed-off-by: Wenxin Zhang <[email protected]>

minor fix

Signed-off-by: Wenxin Zhang <[email protected]>

remove runner.name when running on ubuntu-latest

Signed-off-by: Wenxin Zhang <[email protected]>

update CI to share system

Signed-off-by: Wenxin Zhang <[email protected]>

rename jblas tp bestla
directory reorg

Signed-off-by: Hengyu Meng <[email protected]>

remove itrex dependency

Signed-off-by: Hengyu Meng <[email protected]>

fix script path\n remove python dependency

Signed-off-by: Hengyu Meng <[email protected]>

-s

remove python tests

disable percentage

disable monitor

Signed-off-by: Hengyu Meng <[email protected]>

fix naming

fix threadpool conflict

Signed-off-by: Hengyu Meng <[email protected]>

restore percentage

Signed-off-by: Hengyu Meng <[email protected]>
add bestla workflow image

Signed-off-by: Hengyu Meng <[email protected]>
@VincyZhang VincyZhang merged commit 32d9267 into main Dec 20, 2023
17 checks passed
@airMeng airMeng deleted the ns_init branch December 21, 2023 02:37
DDEle pushed a commit to DDEle/neural-speed that referenced this pull request Feb 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.