init commit (#4)

* refact folder stracture (#170) * fix starcoder quantization bug (#159) Signed-off-by: zhenwei-intel <[email protected]> * update readme path and copy hidden files (#185) * move hidden files Signed-off-by: zhenwei-intel <[email protected]> * update readme path Signed-off-by: zhenwei-intel <[email protected]> --------- Signed-off-by: zhenwei-intel <[email protected]> * Unify scripts for converting, quantizing and chatting (#161) * Unify scripts for converting, quantizing and chatting Signed-off-by: zhenwei-intel <[email protected]> * move folder * update script with subprocess Signed-off-by: zhenwei-intel <[email protected]> --------- Signed-off-by: zhenwei-intel <[email protected]> * [CPP Graph] Falcon 40B (#175) * initial commit of n_head_kv in MQA Signed-off-by: Yu, Zhentao <[email protected]> * add attn ln Signed-off-by: Yu, Zhentao <[email protected]> * reorder QKV weight when convert Signed-off-by: Yu, Zhentao <[email protected]> * fix typo Signed-off-by: Yu, Zhentao <[email protected]> * cherry-pick ggml MQA Signed-off-by: Yu, Zhentao <[email protected]> * fix kv cache and reduce handmade mem buffer size Signed-off-by: Yu, Zhentao <[email protected]> --------- Signed-off-by: Yu, Zhentao <[email protected]> * Update README.md (#198) * Update README.md Update the readme * Update README.md * Update README.md * Update README.md * Refine reademe of llm runtime (#200) * Refine Inference Workflow Readme (#214) * Refine Inference Workflow Readme --------- Signed-off-by: hshen14 <[email protected]> Co-authored-by: lvliang-intel <[email protected]> Co-authored-by: Wang, Chang <[email protected]> * [CPP Graph] add s8 perchannel quant and kernel. (#181) * add s8 perchannel quant and kernel. * add QKV , add fusion support for s8 PerN * add amx_int8 pern gelu fusion * add gelu add fusion for vnni * split jblas file. add compute type fp32. * add comp_type fp32 for ffn fusion * add bf16 for s4 and s4 ffn fusion * add workspace for jblas functions * keep one jblas code * disable mmap as default. change arg --no_mmap to --use_mmap. * Force CMake to add --std=cxx/c++xx (#205) * Fix graph model quantization with AVX2-only platforms (#221) * refine reademe (#211) * refine reademe * refine reademe * refine table * Refine LLM Runtime readme Signed-off-by: hshen14 <[email protected]> * Continue updating the readme Signed-off-by: hshen14 <[email protected]> * Simplify the readme Signed-off-by: hshen14 <[email protected]> * add back run_llm.py * change script arg name * rename arg * fix * add description * add another way to convert model * remove additional line * refine readme * refine readme, but we need to modify convert script later * fix model_maps Signed-off-by: zhenwei-intel <[email protected]> * fix convert_gptj Signed-off-by: zhenwei-intel <[email protected]> * refine readme * refine --------- Signed-off-by: hshen14 <[email protected]> Signed-off-by: zhenwei-intel <[email protected]> Co-authored-by: hshen14 <[email protected]> Co-authored-by: zhenwei-intel <[email protected]> * fix scan (#224) Signed-off-by: Dong, Bo1 <[email protected]> * support bloom for cpp (#207) * support bloom Signed-off-by: Dong, Bo1 <[email protected]> * move neural engine to deprecated (#206) * [CPP Graph] Enhance beam search (length_penalty + min_new_tokens) (#173) * add length_penalty and min_new_tokens_logits_process Signed-off-by: Yu, Zhentao <[email protected]> * revert V cache reorder Signed-off-by: Yu, Zhentao <[email protected]> * refact beam_search codes arch Signed-off-by: Yu, Zhentao <[email protected]> * fix n_threads Signed-off-by: Yu, Zhentao <[email protected]> * make beam_kv_cache_reorder as a class Signed-off-by: Yu, Zhentao <[email protected]> * clean code Signed-off-by: Yu, Zhentao <[email protected]> --------- Signed-off-by: Yu, Zhentao <[email protected]> Co-authored-by: Haihao Shen <[email protected]> * [Cpp Graph]fix q8 pern QKV fusion of vnni (#230) * fix q8 pern QKV fusion of vnni * add silu jit kernel. add silu fusion. * fix the result of llama silu fusion * enable jit swish for higher performance * [CPP Graph] Rename LLM chat application (#236) * rename llm chat application Signed-off-by: Yu, Zhentao <[email protected]> * rename CI test script Signed-off-by: Yu, Zhentao <[email protected]> --------- Signed-off-by: Yu, Zhentao <[email protected]> Co-authored-by: Dong, Bo <[email protected]> * [CPP Graph] add opt cpp graph and chat application (#133) * update onednn to v3.3-pc (#187) Signed-off-by: zhenwei-intel <[email protected]> * [CPP Graph] AMX-BF16 MHA with KV update (#179) * update jblas to b3c75b2 * mha refatctor changes * full fp16 mha draft * support fp32fp16fp16fp32 jblas mha with fp16 kernels * add fp16 mha fusion * fix the issue of fp16 on low gcc versions * keep the same permute for bf16 and fp16 MHA * fix param for fp16 MHA * mha amxbf16 supports reo-k * prepare fwd args for int8 inference * int8 mha draft * draft of bf16 mha with kv-update * disable fp16mha by default * fix mha nan * fall back to bf16 when unsupported * check mha support * update swish alpha value * fix fp32 silu bug * disable mha on compilers without bf16 intrinsics --------- Signed-off-by: Ding, Yi1 <[email protected]> Co-authored-by: luoyu-intel <[email protected]> * [CPP Graph]Enable FFN fusion (#160) * fix the error of convert bloom and opt (#254) Signed-off-by: intellinjun <[email protected]> * add TP and gptj model support (#223) * add TP and gptj model support 1. add TP_1D algo 2. add parallel_context for broadcast/reduce 3. support all data type 4. support gptj model Signed-off-by: Clark Chin <[email protected]> * Fix models without jblas-based kvcache support (#260) Signed-off-by: Ding, Yi1 <[email protected]> * Update transformers version (#259) * [CPP Graph] ChatGLM-2 Enabling (#210) * chatglm-2 q4_j infernece pass with correct accuracy * unift convert scripts * specify chatglm2, remove ambiguous chatglm * initilize glm1 * initilize glm1 * Fix kernel issues for glm1 * adapt to the latest main and chatglm2 infernece pass * add parameters for all convert.py Signed-off-by: Zhenzhong1 <[email protected]> * add parameters for the bloom * update README and cleancode * disable chatglm1 --------- Signed-off-by: Zhenzhong1 <[email protected]> * [CPP Graph] fix broken format (#262) * add one-click script for cpp graph running (#203) * fix3rdparty--rebase (#239) * Q4 perchannel (#271) * add s4 perchannel quant and inner product code. * Add weight_only support for PyTorch framework (#234) * Fix q40 gptj with MHA fusion enabled & remove logits.txt (#285) Signed-off-by: Ding, Yi1 <[email protected]> * Revert "Add weight_only support for PyTorch framework (#234)" This reverts commit cea3a582fa6ac7afa0d8e679b80b04389aa18abc. * Disable building OneDNN examples & tests (#288) Signed-off-by: Ding, Yi1 <[email protected]> * fix the bloom and dolly ffn fusion error (#284) * Build wheel from cached dnnl local (#303) Signed-off-by: Ding, Yi1 <[email protected]> Signed-off-by: Wenxin Zhang <[email protected]> Co-authored-by: Ding, Yi1 <[email protected]> * [CPP Graph] ChatGLM Enabling and ChatGLM-2 Issues Fix (#278) * [Graph] windows build (#312) * fix win build error * add win header * modify MD * clang-format 14 * [CPP Graph] Asym model (#306) * Add weight_only support for PyTorch framework (#297) * update onednn to v3.3-pc (#332) Signed-off-by: zhenwei-intel <[email protected]> * Update ChatGLM-6B to README.md (#344) * Python api for cpp model (#252) * New avx512_vnni kernel (#343) * update avx512_vnni kernels --------- Co-authored-by: ZheWang <[email protected]> * Refine Script and args for Cpp Graph (#320) * Restrict onnxruntime version (#350) * Add dnnl_dim_t cast to fix executor windows failure (#347) * update llm runtime parameters (#362) * update param Signed-off-by: zhenwei-intel <[email protected]> * update llm runtime parameters Signed-off-by: zhenwei-intel <[email protected]> * rename one click run to run Signed-off-by: zhenwei-intel <[email protected]> * rename compute_type to compute_dtype Signed-off-by: zhenwei-intel <[email protected]> * use ggml Signed-off-by: zhenwei-intel <[email protected]> * update Signed-off-by: zhenwei-intel <[email protected]> * update Signed-off-by: zhenwei-intel <[email protected]> * update parameters Signed-off-by: zhenwei-intel <[email protected]> * fix run Signed-off-by: zhenwei-intel <[email protected]> * fix use-ggml Signed-off-by: zhenwei-intel <[email protected]> * fix Signed-off-by: zhenwei-intel <[email protected]> * fix strcasecmp Signed-off-by: zhenwei-intel <[email protected]> * store true for use-ggml Signed-off-by: zhenwei-intel <[email protected]> * update format Signed-off-by: zhenwei-intel <[email protected]> --------- Signed-off-by: zhenwei-intel <[email protected]> * try aspll spellingcheck (#368) * [CPP Graph] KV-Update Optimization (#369) * fix python api bug (#382) * change mainpage (#340) * [CPP Graph] Enable llama2-70b (#213) * add readme for llm kernels (#386) * add readme for llm kernels Co-authored-by: VincyZhang <[email protected]> * Update README.md for llama2 70B (#391) * Refine LLM runtime readme (#395) * Refine LLM runtime readme Signed-off-by: hshen14 <[email protected]> * Use transfomers tokenizer and streamer for python api (#388) * [Cpp Graph] Align Cpp Beam Search (#322) * not compiling python api in cpp graph by default (#401) * using AutoModelCausalLM Signed-off-by: zhenwei-intel <[email protected]> * not compiling python api of cpp model Signed-off-by: zhenwei-intel <[email protected]> --------- Signed-off-by: zhenwei-intel <[email protected]> Co-authored-by: Dong, Bo <[email protected]> * [CPP Graph] Falcon MHA support (#422) * reinit cpp model and infinite text generation (#413) * [CPP Graph] ChatGLM2 MHA support (#435) * update post process with num_beams and do_sample (#430) * use mpt post process Signed-off-by: zhenwei-intel <[email protected]> * update jblas (#433) * pass compilation, before model test. * upgrade QBits * update jblas Co-authored-by: ZheWang <[email protected]> * fixed the version of transformers (#437) * [Cpp Graph] Update Falcon HF para and support Falcon-180B (#414) * [CPP Graph] MPT MHA support (#453) * [CPP Graph] Baichuan & Baichuan2 Enabling (#376) * Enable Baichan and Baichuan2 in LLM Runtime * GitHub Action Workflows speedup (#456) * workflow speedup * read special token id from tokenizer (#463) * read special token id from tokenizer Signed-off-by: zhenwei-intel <[email protected]> * gelu support (#424) Co-authored-by: intellinjun <[email protected]> * Fix msvc compile issues (#477) * [Cpp Graph] Beam Search Pybind (model archs: gptj and gptneox) (#449) * fix post process with topk topp of python api (#476) * [CPP Graph] Opt qbits dequant (#465) * [RUNTIME] Enabing streaming llm for Runtime (#501) * Support StreamingLLM on CPU Signed-off-by: zhenwei-intel <[email protected]> * support Avx2 (#493) * support Memcpy2D * support gelu fusion --------- Co-authored-by: luoyu-intel <[email protected]> * Fix typo in README.md (#516) convertion -> conversion Signed-off-by: Ikko Eltociear Ashimine <[email protected]> * improve Avx2 (#511) * reduce unnecessary tests (#521) * update python api readme (#504) * Update README.md Signed-off-by: Haihao Shen <[email protected]> * Update README.md Signed-off-by: Haihao Shen <[email protected]> * Update README.md Signed-off-by: Haihao Shen <[email protected]> * Update README.md Signed-off-by: Haihao Shen <[email protected]> * Update README.md Signed-off-by: Haihao Shen <[email protected]> * Revert "update python api readme (#504)" This reverts commit 5f4175ad754fb2e3c1f0f2f49a5a8356c1c3e170. * reduce unnessasory tests Signed-off-by: Wenxin Zhang <[email protected]> * reduce unnessasory tests Signed-off-by: Wenxin Zhang <[email protected]> * reduce unnessasory tests Signed-off-by: Wenxin Zhang <[email protected]> * reduce unnessasory tests Signed-off-by: Wenxin Zhang <[email protected]> --------- Signed-off-by: Haihao Shen <[email protected]> Signed-off-by: Wenxin Zhang <[email protected]> Co-authored-by: liuzhenwei <[email protected]> Co-authored-by: Haihao Shen <[email protected]> * [LLM Runtime] update python api readme (#525) * [LLM Runtime] Baichuan FFN & MHA support (#497) * [CPP Graph] Fused Attention Doc (#443) * Add doc for fused attn * [NeuralChat] Add neuralchat UT for cache and memory (#502) Add neuralchat UT for cache and memory Signed-off-by: Liangyx2 <[email protected]> * [Documentation] upload streaming llm video (#533) * upload streaming llm video Signed-off-by: zhenwei-intel <[email protected]> * support attention block TP and add gptj llama model (#361) * [LLM Runtime] Enable Mistral-7b (#552) * [LLM Runtime] Enable Mistral-7b Signed-off-by: intellinjun <[email protected]> * Add itrex llm runtime graph int4 notebook (#399) * [LLM Runtime] Enable interactive mode of python api (#548) * [LLM Runtime] Streaming-LLM based on shift RoPE (#580) * [LLM Runtime] enable MHA fusion for gptneox&dolly&starcoder&llama2-70b (#567) * [Doc] change the structure of llm runtime readme (#596) * add warning in graph build * add more info * Added script of merging peft adapter for quantization of llm with peft (#615) * added script of merging peft adapter for quantization of llm with peft. Signed-off-by: Ye, Xinyu <[email protected]> * Fix bloom ffn fusion (#620) * [DOC] add LLM Runtime developer document (#609) * add developer document Signed-off-by: intellinjun <[email protected]> * [Document] update llm runtime readme (#623) * [LLM Runtime] Fix LLaMA after discarding KV-cache (#625) * [LLM Runtime] Shift-RoPE-based Streaming-LLM for Fused-Attention (#608) * sync jblas 6656837 * shift-RoPE with mha * restrain transformers version (#627) * [LLM Runtime] integrate AVX_VNNI (#565) * [LLM Runtime] Multi-Round chat with chatglm2 (#646) * [LLM Runtime] Unify KV_cache and Support Batch-dim Process in Beam Search (#583) * [LLM Runtime] Allow CompileBF16 on GCC11 (#655) * Allow CompileBF16 on GCC11 * fixed bf16 error in convert_llama.py (#661) * [Doc]add readme (#663) * add support matrix * diable bf16 scale for jblas (#662) Signed-off-by: Hengyu Meng <[email protected]> * [LLM Runtime]Fix gptneox bug (#671) Signed-off-by: intellinjun <[email protected]> * [LLM Runtime] Refine Python API (#665) * [LLM Runtime] add python api for mistral (#684) Signed-off-by: intellinjun <[email protected]> * fix typo : graph_developer_document branch no longer exists (#686) Signed-off-by: sangjune.park <[email protected]> * [LLM Runtime] Support load_in_nbit in llm runtime (#688) * support load_in_nbit in llm runtime Signed-off-by: zhenwei-intel <[email protected]> * [LLM Runtime] Update README (#696) * update readme (#708) Update LLM runtime readme * [LLM Runtime] Add Script for PPL Evaluation (#685) * [LLM Runtime] Optimize tests of llm runtime (#718) * separate optimize UT and improve UT infra (#729) * [LLM Runtime] enable qwen graph (#669) * [LLM Runtime] enable qwen graph Signed-off-by: intellinjun <[email protected]> * [LLM Runtime] Enable GPTQ models (#611) * Enable GPTQ for bloom model Signed-off-by: zhenwei-intel <[email protected]> * [LLM Runtime] Add jblas split weight interface and support jblas models (#639) * [LLM Runtime] Add jblas split weight interface and support jblas models Signed-off-by: Clark Chin <[email protected]> * [LLM Runtime] Beam Search Support of Fused Attention (#734) * Update GPTQ into README (#781) * Update GPTQ into README Signed-off-by: Dong, Bo <[email protected]> * Update README.md Signed-off-by: Dong, Bo <[email protected]> --------- Signed-off-by: Dong, Bo <[email protected]> * fix : max output token (#788) Signed-off-by: sangjune.park <[email protected]> * docs : reinforcement llm runtime graph devleoper guide (#786) Signed-off-by: sangjune.park <[email protected]> * [LLM Runtime] Check weight dtype and compute dtype (#778) * [LLM Runtime] Fix develop doc and convert.py (#794) * fix develop doc and convert.py Signed-off-by: Yu, Zhentao <[email protected]> * fix : init_from_bin example (#789) * [LLM Runtime] Enable whisper new app (#682) * [Engine] Apply the STS task to bge models (#673) * [LLM Runtime]fix format (#812) * [LLM Runtime] fix added_tokens error (#793) Signed-off-by: intellinjun <[email protected]> * Update README.md Signed-off-by: Haihao Shen <[email protected]> * update (#823) Signed-off-by: Dong, Bo1 <[email protected]> * [Doc] update README for Qwen chat (#808) * [LLM Runtime] ChatGLM-V1 multi-batch infer and batched greedy search generation (#700) * [LLM Runtime] Remove use_cache in WOQ (#818) * make void to char to avoid the unknow size (#856) Signed-off-by: Dong, Bo1 <[email protected]> * [Infra] enhance CI scan (#834) * Fix kernels softmax in int8 mha (#869) Co-authored-by: kevinintel <[email protected]> * [LLM Runtime] Baichuan13B inference bug fix (#891) * Baichuan13B FP32 inference bug fix * [LLM Runtime] Remove the identical branch (#894) * [LLM Runtime] make rms_norm_eps and freq_base as parameter (#903) * [LLM Runtime] refactor itrex backend based on the latest Jblas (#769) Co-authored-by: luoyu-intel <[email protected]> Co-authored-by: Ding, Yi1 <[email protected]> Co-authored-by: zhenwei-intel <[email protected]> Co-authored-by: yuchengliu1 <[email protected]> Co-authored-by: Meng, Hengyu <[email protected]> * [Doc] add gaudi2 in doc (#799) * [LLM Runtime] Add MX-Format (FP8_E5M2, FP8_E4M3, FP4_E2M1, NF4) (#872) * add fp8 in llm frontend Signed-off-by: Yu, Zhentao <[email protected]> * [LLM Runtime] Fix PPL Test (#937) * [LLM Runtime] Add MatMul data types combinations table (#945) * [LLM Runtime] decoupling weight_type and scale_type in Qbits (#940) * [LLM Runtime] Convert huggingface gptq model to jblas (#927) Co-authored-by: luoyu-intel <[email protected]> * reorg directory migrate CI Signed-off-by: Hengyu Meng <[email protected]> refine CI for neuralspeed Signed-off-by: Wenxin Zhang <[email protected]> add more CI scripts Signed-off-by: Wenxin Zhang <[email protected]> minor fix Signed-off-by: Wenxin Zhang <[email protected]> remove runner.name when running on ubuntu-latest Signed-off-by: Wenxin Zhang <[email protected]> update CI to share system Signed-off-by: Wenxin Zhang <[email protected]> rename jblas tp bestla directory reorg Signed-off-by: Hengyu Meng <[email protected]> remove itrex dependency Signed-off-by: Hengyu Meng <[email protected]> fix script path\n remove python dependency Signed-off-by: Hengyu Meng <[email protected]> -s remove python tests disable percentage disable monitor Signed-off-by: Hengyu Meng <[email protected]> fix naming fix threadpool conflict Signed-off-by: Hengyu Meng <[email protected]> restore percentage Signed-off-by: Hengyu Meng <[email protected]> * fix bestla typo add bestla workflow image Signed-off-by: Hengyu Meng <[email protected]> * fix scripts path * fix pylint and cpplint --------- Signed-off-by: zhenwei-intel <[email protected]> Signed-off-by: Yu, Zhentao <[email protected]> Signed-off-by: hshen14 <[email protected]> Signed-off-by: Dong, Bo1 <[email protected]> Signed-off-by: intellinjun <[email protected]> Signed-off-by: Clark Chin <[email protected]> Signed-off-by: Ding, Yi1 <[email protected]> Signed-off-by: Zhenzhong1 <[email protected]> Signed-off-by: Wenxin Zhang <[email protected]> Signed-off-by: Ikko Eltociear Ashimine <[email protected]> Signed-off-by: Haihao Shen <[email protected]> Signed-off-by: Liangyx2 <[email protected]> Signed-off-by: Ye, Xinyu <[email protected]> Signed-off-by: Hengyu Meng <[email protected]> Signed-off-by: sangjune.park <[email protected]> Signed-off-by: Dong, Bo <[email protected]> Co-authored-by: Cheng, Penghui <[email protected]> Co-authored-by: liuzhenwei <[email protected]> Co-authored-by: zhentaoyu <[email protected]> Co-authored-by: Dong, Bo <[email protected]> Co-authored-by: kevinintel <[email protected]> Co-authored-by: Haihao Shen <[email protected]> Co-authored-by: lvliang-intel <[email protected]> Co-authored-by: Wang, Chang <[email protected]> Co-authored-by: luoyu-intel <[email protected]> Co-authored-by: Yi DING <[email protected]> Co-authored-by: zhenwei-intel <[email protected]> Co-authored-by: intellinjun <[email protected]> Co-authored-by: Chen Xi <[email protected]> Co-authored-by: Zhenzhong1 <[email protected]> Co-authored-by: CeciliaWwq <[email protected]> Co-authored-by: Wenxin Zhang <[email protected]> Co-authored-by: ZheWang <[email protected]> Co-authored-by: yuchengliu1 <[email protected]> Co-authored-by: Ikko Eltociear Ashimine <[email protected]> Co-authored-by: Liangyx2 <[email protected]> Co-authored-by: XinyuYe-Intel <[email protected]> Co-authored-by: akarX23 <[email protected]> Co-authored-by: sangjune.park <[email protected]>
intel · Dec 20, 2023 · 09ec939 · 09ec939
1 parent 49e4e8c
commit 09ec939
Show file tree

Hide file tree

Showing 245 changed files with 98,593 additions and 0 deletions.
diff --git a/.clang-format b/.clang-format
@@ -0,0 +1,7 @@
+Language:        Cpp
+BasedOnStyle:  Google
+DerivePointerAlignment: false
+ColumnLimit: 120
+SpaceBeforeParens: ControlStatements
+SpaceBeforeRangeBasedForLoopColon: true
+SortIncludes: false
diff --git a/.editorconfig b/.editorconfig
@@ -0,0 +1,12 @@
+root = true
+
+[*]
+charset = utf-8
+indent_style = space
+indent_size = 2
+end_of_line = lf
+insert_final_newline = true
+trim_trailing_whitespace = true
+
+[*.py]
+indent_size = 4
diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
@@ -0,0 +1,21 @@
+## Type of Change
+
+feature or bug fix or documentation or others
+API changed or not
+
+## Description
+
+detail description 
+Issues: xxx
+
+## Expected Behavior & Potential Risk
+
+the expected behavior that triggered by this PR 
+
+## How has this PR been tested?
+
+how to reproduce the test (including hardware information)
+
+## Dependency Change?
+
+any library dependency introduced or removed
diff --git a/.github/workflows/copyright_check.yml b/.github/workflows/copyright_check.yml
@@ -0,0 +1,72 @@
+name: Copyright Check
+
+on:
+  pull_request:
+    branches: [main]
+    paths:
+      - neural_speed/**
+      - setup.py
+      - .github/workflows/format_scan.yml
+  workflow_dispatch:
+
+# If there is a new commit, the previous jobs will be canceled
+concurrency:
+  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
+  cancel-in-progress: true
+
+env:
+  CODE_SCAN_LOG_PATH: "${{ github.workspace }}/log"
+  CONTAINER_NAME: "codeScan"
+
+jobs:
+  format-scan:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        job_name: ["copyright"]
+      fail-fast: false
+    steps:
+
+      - name: Checkout out Repo
+        uses: actions/checkout@v3
+
+      - name: CopyRight check
+        run: |
+          source ${{ github.workspace }}/.github/workflows/scripts/change_color.sh
+          set -e
+          mkdir -p ${{ env.CODE_SCAN_LOG_PATH }}
+          supported_extensions=(py, sh, yaml)
+          git fetch
+          git --no-pager diff --name-only remotes/origin/${{ github.base_ref }} ${{ github.workspace }}/neural_speed> ${{ env.CODE_SCAN_LOG_PATH }}/diff.log
+          files=$(cat ${{ env.CODE_SCAN_LOG_PATH }}/diff.log | awk '!a[$0]++')
+          $LIGHT_PURPLE && echo " ----------------- checking ... --------------------------" && $RESET
+          if [[ -f ${{ env.CODE_SCAN_LOG_PATH }}/copyright_issue_summary.log ]]; then 
+            rm -f ${{ env.CODE_SCAN_LOG_PATH }}/copyright_issue_summary.log
+          fi
+          for file in ${files}
+          do
+              if [[ "${supported_extensions[@]}" =~ "${file##*.}" ]]; then
+                  if [ $(grep -E -c "Copyright \\(c\\) ([0-9]{4})(-[0-9]{4})? Intel Corporation" ${file}) = 0 ]; then
+                      echo ${file} >> ${{ env.CODE_SCAN_LOG_PATH }}/copyright_issue_summary.log
+                      $BOLD_YELLOW && echo " -----------------  Current log file output start --------------------------"
+                      cat ${{ env.CODE_SCAN_LOG_PATH }}/copyright_issue_summary.log
+                      $BOLD_YELLOW && echo " -----------------  Current log file output end --------------------------" && $RESET
+                      $BOLD_RED && echo "CopyRight has something wrong! Please click on the artifact button to download and view the error log!" && $RESET
+                  fi
+              else
+                  $LIGHT_PURPLE && echo "Skipping ${file}"  && $RESET
+              fi
+          done
+          if [[ -f ${{ env.CODE_SCAN_LOG_PATH }}/copyright_issue_summary.log ]]; then 
+            $BOLD_YELLOW && echo " -----------------  Current log file output start --------------------------"
+            cat ${{ env.CODE_SCAN_LOG_PATH }}/copyright_issue_summary.log
+            $BOLD_YELLOW && echo " -----------------  Current log file output end --------------------------" && $RESET
+            $BOLD_RED && echo "CopyRight has something wrong! Please click on the artifact button to download and view the error log!" && $RESET && exit 1
+          fi
+
+      - name: Publish pipeline artifact
+        if: ${{ failure() }}
+        uses: actions/upload-artifact@v3
+        with:
+          name: ${{ matrix.job_name }}
+          path: ${{ env.CODE_SCAN_LOG_PATH }}.*
diff --git a/.github/workflows/cpp-graph-test.yml b/.github/workflows/cpp-graph-test.yml
@@ -0,0 +1,156 @@
+name: CPP Graph Test
+
+on:
+  pull_request:
+    branches: [main]
+    paths:
+       - '.github/workflows/cpp-graph-test.yml'
+       - '.github/workflows/scripts/models/cpp_graph_inference.sh'
+       - 'neural_speed/**'
+       - 'bestla/**'
+  workflow_dispatch:
+    inputs:
+      compiler_version:
+        description: 'compiler_version'
+        required: false
+        type: string
+        default: '13.1.0'
+      models:
+        description: 'models (in json)'
+        required: false
+        type: string
+        default: '["llama-2-7b-chat", "gptj-6b"]'
+      runner:
+        description: 'runner'
+        required: false
+        type: string
+        default: 'spr'
+
+# If there is a new commit, the previous jobs will be canceled
+concurrency:
+  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
+  cancel-in-progress: true
+
+env:
+  OUT_SCRIPT_PATH: ${{ github.workspace }}/.github/workflows/scripts/models
+  SCRIPT_PATH: ${{ github.workspace }}/.github/workflows/scripts
+  WORKING_DIR: ${{ github.workspace }}
+  CONTAINER_NAME: "codeScan"
+  INPUT_COMPILER_VERSION: ${{ inputs.compiler_version || '13.1.0' }}
+
+jobs:
+  CPP-Graph-Workflow:
+    runs-on: ${{inputs.runner || 'spr'}}
+    strategy:
+      matrix:
+        modelName: ${{fromJson(inputs.models || '["llama-2-7b-chat", "gptj-6b"]')}}
+    steps:
+      - name: Checkout out Repo
+        uses: actions/checkout@v3
+        with:
+          submodules: "recursive"
+          fetch-tags: true
+
+      - name: Env build
+        run: |
+          bash ${{ github.workspace }}/.github/workflows/scripts/prepare_env_with_conda.sh "cpp-graph-test" "3.8"
+
+      - name: Binary build
+        if: 0 == 1
+        run: |
+          cd ${{ github.workspace }}
+          conda activate cpp-graph-test || source activate cpp-graph-test
+          pip install build --upgrade
+          pip install -r requirements.txt
+          python setup.py sdist bdist_wheel
+          pip install dist/neuralspeed*.whl
+          pip list
+
+      - name: BF16 Benchmark
+        run: |
+          cd ${{ github.workspace }}/.github/workflows/scripts/models
+          bash cpp_graph_inference.sh cpp-graph-test ${{ matrix.modelName }} ${{ env.INPUT_COMPILER_VERSION }}
+      
+      - name: Rename summary
+        run: |
+          cd ${{ github.workspace }}
+          cp cpp_graph_summary.log cpp_graph_summary_${{matrix.modelName}}.log
+
+      - name: Publish pipeline artifact
+        uses: actions/upload-artifact@v3
+        if: ${{ !cancelled() }}
+        with:
+          name: cpp_graph
+          path: ${{ github.workspace }}/cpp_graph_summary_${{matrix.modelName}}.log
+          if-no-files-found: ignore # 'warn' or 'ignore' are also available, defaults to `warn`
+          retention-days: 60 # 1 <= retention-days <= 90
+
+  Genreate-Report:
+    runs-on: ubuntu-latest
+    needs: [CPP-Graph-Workflow]
+    steps:
+      - name: Docker Clean Up
+        run: |
+          docker ps -a
+          if [[ $(docker ps -a | grep -i '${{ env.CONTAINER_NAME }}-${{ runner.name }}'$) ]]; then
+              docker start ${{ env.CONTAINER_NAME }}-${{ runner.name }}
+              echo "remove left files through container ..."
+              docker exec ${{ env.CONTAINER_NAME }}-${{ runner.name }} bash -c "ls -a /neural-speed && rm -fr /neural-speed/* && rm -fr /neural-speed/.* || true"
+          fi
+      - name: Checkout out Repo
+        uses: actions/checkout@v3
+
+      - name: Download Summary Log
+        uses: actions/download-artifact@v3
+        with:
+          path: ${{ env.OUT_SCRIPT_PATH }}/generated/log
+
+      - name: Merge CPP Graph Summary Log
+        run: |
+          cd ${{ env.OUT_SCRIPT_PATH }}/generated/log/cpp_graph
+          for summary in $(find . -name "cpp_graph_summary_*.log"); do cat $summary >> cpp_graph_summary.log; done
+
+      - name: Download Reference Artifact
+        id: download-artifact
+        uses: dawidd6/action-download-artifact@v2
+        with:
+          workflow: cpp-graph-test.yml
+          name: FinalReport
+          run_id: ${{ vars.GRAPH_REF_ID }}
+          path: ${{ env.OUT_SCRIPT_PATH }}
+          name_is_regexp: true
+          repo: ${{ github.repository }}
+          check_artifacts: false
+          search_artifacts: false
+          skip_unpack: false
+          if_no_artifact_found: warn
+
+      - name: Display structure of downloaded files
+        run: cd ${{ env.OUT_SCRIPT_PATH }} && ls -R
+
+      - name: Generate report
+        run: |
+          echo "------ Generating final report.html ------"
+          cd ${{ env.OUT_SCRIPT_PATH }}
+          /usr/bin/bash generate_report.sh --workflow=deploy
+          sed -n '/<body>/,/<\/body>/p' generated/report.html | sed -r '/^$/d' | sed -r 's/^ +//g' >> $GITHUB_STEP_SUMMARY
+        env:
+          RUN_DISPLAY_URL: https://github.com/neural-speed/actions/runs/${{ github.run_id }}
+          BUILD_NUMBER: ${{ github.run_id }}
+          JOB_STATUS: succeed
+          MR_source_branch: ${{ github.head_ref }}
+          ghprbActualCommit: ${{ github.event.pull_request.head.sha }}
+
+      - name: Publish Report
+        uses: actions/upload-artifact@v3
+        if: ${{ !cancelled() }}
+        with:
+          name: FinalReport
+          path: ${{ env.OUT_SCRIPT_PATH }}/generated
+
+      - name: Specify performance regression
+        run: |
+          if [ $(is_perf_reg) == 'true' ]; then
+            echo "[Performance Regression] Some model performance regression occurred, please check artifacts and reports."
+            exit 1
+          fi
diff --git a/.github/workflows/docker/codeScan.dockerfile b/.github/workflows/docker/codeScan.dockerfile
@@ -0,0 +1,41 @@
+#
+# Copyright (c) 2022 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+ARG UBUNTU_VER=22.04
+FROM ubuntu:${UBUNTU_VER} as devel
+
+# See http://bugs.python.org/issue19846
+ENV LANG C.UTF-8
+
+RUN apt-get update && apt-get install -y --no-install-recommends --fix-missing \
+    aspell \
+    aspell-en \
+    python3 \
+    python3-pip \
+    python3-dev \
+    python3-distutils \
+    build-essential \
+    cloc \
+    python3.10-venv \
+    git
+
+RUN ln -sf $(which python3) /usr/bin/python
+
+RUN python -m pip install --no-cache-dir pylint==2.17.5\
+    bandit==1.7.4\
+    pyspelling\
+    pydocstyle
+
+WORKDIR /
diff --git a/.github/workflows/docker/devel.dockerfile b/.github/workflows/docker/devel.dockerfile
@@ -0,0 +1,48 @@
+#
+# Copyright (c) 2022 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+ARG UBUNTU_VER=22.04
+FROM ubuntu:${UBUNTU_VER} as devel
+
+# See http://bugs.python.org/issue19846
+ENV LANG C.UTF-8
+
+RUN apt-get update && apt-get install -y --no-install-recommends --fix-missing \
+    python3 \
+    python3-pip \
+    python3-dev \
+    python3-distutils \
+    autoconf \
+    build-essential \
+    git \
+    libgl1-mesa-glx \
+    libglib2.0-0 \
+    numactl \
+    time \
+    wget \
+    bc \
+    gawk \
+    jq \
+    python3.10-venv \
+    vim
+
+RUN ln -sf $(which python3) /usr/bin/python
+
+RUN python -m pip --no-cache-dir install --upgrade pip
+RUN python -m pip install --no-cache-dir setuptools
+
+RUN pip list
+
+WORKDIR /
+