Skip to content
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.

Commit

Permalink
init commit (#4)
Browse files Browse the repository at this point in the history
* refact folder stracture (#170)

* fix starcoder quantization bug (#159)

Signed-off-by: zhenwei-intel <[email protected]>

* update readme path and copy hidden files (#185)

* move hidden files

Signed-off-by: zhenwei-intel <[email protected]>

* update readme path

Signed-off-by: zhenwei-intel <[email protected]>

---------

Signed-off-by: zhenwei-intel <[email protected]>

* Unify scripts for converting, quantizing and chatting (#161)

* Unify scripts for converting, quantizing and chatting

Signed-off-by: zhenwei-intel <[email protected]>

* move folder

* update script with subprocess

Signed-off-by: zhenwei-intel <[email protected]>

---------

Signed-off-by: zhenwei-intel <[email protected]>

* [CPP Graph] Falcon 40B (#175)

* initial commit of n_head_kv in MQA

Signed-off-by: Yu, Zhentao <[email protected]>

* add attn ln

Signed-off-by: Yu, Zhentao <[email protected]>

* reorder QKV weight when convert

Signed-off-by: Yu, Zhentao <[email protected]>

* fix typo

Signed-off-by: Yu, Zhentao <[email protected]>

* cherry-pick ggml MQA

Signed-off-by: Yu, Zhentao <[email protected]>

* fix kv cache and reduce handmade mem buffer size

Signed-off-by: Yu, Zhentao <[email protected]>

---------

Signed-off-by: Yu, Zhentao <[email protected]>

* Update README.md (#198)

* Update README.md

Update the readme

* Update README.md

* Update README.md

* Update README.md

* Refine reademe of llm runtime (#200)

* Refine Inference Workflow Readme (#214)

* Refine Inference Workflow Readme

---------

Signed-off-by: hshen14 <[email protected]>
Co-authored-by: lvliang-intel <[email protected]>
Co-authored-by: Wang, Chang <[email protected]>

* [CPP Graph] add s8 perchannel quant and kernel. (#181)

* add s8 perchannel quant and kernel.

* add  QKV , add fusion support for s8 PerN

* add amx_int8 pern gelu fusion

* add gelu add fusion for vnni

* split jblas file. add compute type fp32.

* add comp_type fp32 for ffn fusion

* add bf16 for s4 and s4 ffn fusion

* add workspace for jblas functions

* keep one jblas code

* disable mmap as default. change arg --no_mmap to --use_mmap.

* Force CMake to add --std=cxx/c++xx (#205)

* Fix graph model quantization with AVX2-only platforms (#221)

* refine reademe (#211)

* refine reademe

* refine reademe

* refine table

* Refine LLM Runtime readme

Signed-off-by: hshen14 <[email protected]>

* Continue updating the readme

Signed-off-by: hshen14 <[email protected]>

* Simplify the readme

Signed-off-by: hshen14 <[email protected]>

* add back run_llm.py

* change script arg name

* rename arg

* fix

* add description

* add another way to convert model

* remove additional line

* refine readme

* refine readme, but we need to modify convert script later

* fix model_maps

Signed-off-by: zhenwei-intel <[email protected]>

* fix convert_gptj

Signed-off-by: zhenwei-intel <[email protected]>

* refine readme

* refine

---------

Signed-off-by: hshen14 <[email protected]>
Signed-off-by: zhenwei-intel <[email protected]>
Co-authored-by: hshen14 <[email protected]>
Co-authored-by: zhenwei-intel <[email protected]>

* fix scan (#224)

Signed-off-by: Dong, Bo1 <[email protected]>

* support bloom for cpp (#207)

* support bloom

Signed-off-by: Dong, Bo1 <[email protected]>

* move neural engine to deprecated (#206)

* [CPP Graph] Enhance beam search (length_penalty + min_new_tokens) (#173)

* add length_penalty and min_new_tokens_logits_process

Signed-off-by: Yu, Zhentao <[email protected]>

* revert V cache reorder

Signed-off-by: Yu, Zhentao <[email protected]>

* refact beam_search codes arch

Signed-off-by: Yu, Zhentao <[email protected]>

* fix n_threads

Signed-off-by: Yu, Zhentao <[email protected]>

* make beam_kv_cache_reorder as a class

Signed-off-by: Yu, Zhentao <[email protected]>

* clean code

Signed-off-by: Yu, Zhentao <[email protected]>

---------

Signed-off-by: Yu, Zhentao <[email protected]>
Co-authored-by: Haihao Shen <[email protected]>

* [Cpp Graph]fix q8 pern QKV fusion of vnni (#230)

* fix q8 pern QKV fusion of vnni

* add silu jit kernel. add silu fusion.

* fix the result of llama silu fusion

* enable jit swish for higher performance

* [CPP Graph] Rename LLM chat application (#236)

* rename llm chat application

Signed-off-by: Yu, Zhentao <[email protected]>

* rename CI test script

Signed-off-by: Yu, Zhentao <[email protected]>

---------

Signed-off-by: Yu, Zhentao <[email protected]>
Co-authored-by: Dong, Bo <[email protected]>

* [CPP Graph] add opt cpp graph and chat application (#133)

* update onednn to v3.3-pc (#187)

Signed-off-by: zhenwei-intel <[email protected]>

* [CPP Graph] AMX-BF16 MHA with KV update (#179)

* update jblas to b3c75b2

* mha refatctor changes

* full fp16 mha draft

* support fp32fp16fp16fp32 jblas mha with fp16 kernels

* add fp16 mha fusion

* fix the issue of fp16 on low gcc versions

* keep the same permute for bf16 and fp16 MHA

* fix param for fp16 MHA

* mha amxbf16 supports reo-k

* prepare fwd args for int8 inference

* int8 mha draft

* draft of bf16 mha with kv-update

* disable fp16mha by default

* fix mha nan

* fall back to bf16 when unsupported

* check mha support

* update swish alpha value

* fix fp32 silu bug

* disable mha on compilers without bf16 intrinsics

---------
Signed-off-by: Ding, Yi1 <[email protected]>
Co-authored-by: luoyu-intel <[email protected]>

* [CPP Graph]Enable FFN fusion (#160)

* fix the error of convert bloom and opt (#254)

Signed-off-by: intellinjun <[email protected]>

* add TP and gptj model support (#223)

* add TP and gptj model support
1. add TP_1D algo
2. add parallel_context for broadcast/reduce
3. support all data type
4. support gptj model

Signed-off-by: Clark Chin <[email protected]>

* Fix models without jblas-based kvcache support (#260)

Signed-off-by: Ding, Yi1 <[email protected]>

* Update transformers version (#259)

* [CPP Graph] ChatGLM-2 Enabling (#210)

* chatglm-2 q4_j infernece pass with correct accuracy

* unift convert scripts

* specify chatglm2, remove ambiguous chatglm

* initilize glm1

* initilize glm1

* Fix kernel issues for glm1

* adapt to the latest main and chatglm2 infernece pass

* add parameters for all convert.py

Signed-off-by: Zhenzhong1 <[email protected]>

* add parameters for the bloom

* update README and cleancode

* disable chatglm1

---------

Signed-off-by: Zhenzhong1 <[email protected]>

* [CPP Graph] fix broken format (#262)

* add one-click script for cpp graph running (#203)

* fix3rdparty--rebase (#239)

* Q4 perchannel (#271)

* add s4 perchannel quant and inner product code.

* Add weight_only support for PyTorch framework (#234)

* Fix q40 gptj with MHA fusion enabled & remove logits.txt (#285)

Signed-off-by: Ding, Yi1 <[email protected]>

* Revert "Add weight_only support for PyTorch framework (#234)"

This reverts commit cea3a582fa6ac7afa0d8e679b80b04389aa18abc.

* Disable building OneDNN examples & tests (#288)

Signed-off-by: Ding, Yi1 <[email protected]>

* fix the bloom and dolly ffn fusion error (#284)

* Build wheel from cached dnnl local (#303)

Signed-off-by: Ding, Yi1 <[email protected]>
Signed-off-by: Wenxin Zhang <[email protected]>
Co-authored-by: Ding, Yi1 <[email protected]>

* [CPP Graph] ChatGLM Enabling and ChatGLM-2 Issues Fix (#278)

* [Graph] windows build (#312)

* fix win build error

* add win header

* modify MD

* clang-format 14

* [CPP Graph] Asym model (#306)

* Add weight_only support for PyTorch framework (#297)

* update onednn to v3.3-pc (#332)

Signed-off-by: zhenwei-intel <[email protected]>

* Update ChatGLM-6B to README.md (#344)

* Python api for cpp model (#252)

* New avx512_vnni kernel (#343)


* update avx512_vnni kernels
---------

Co-authored-by: ZheWang <[email protected]>

* Refine Script and args for Cpp Graph (#320)

* Restrict onnxruntime version (#350)

* Add dnnl_dim_t cast to fix executor windows failure (#347)

* update llm runtime parameters (#362)

* update param

Signed-off-by: zhenwei-intel <[email protected]>

* update llm runtime parameters

Signed-off-by: zhenwei-intel <[email protected]>

* rename one click run to run

Signed-off-by: zhenwei-intel <[email protected]>

* rename compute_type to compute_dtype

Signed-off-by: zhenwei-intel <[email protected]>

* use ggml

Signed-off-by: zhenwei-intel <[email protected]>

* update

Signed-off-by: zhenwei-intel <[email protected]>

* update

Signed-off-by: zhenwei-intel <[email protected]>

* update parameters

Signed-off-by: zhenwei-intel <[email protected]>

* fix run

Signed-off-by: zhenwei-intel <[email protected]>

* fix use-ggml

Signed-off-by: zhenwei-intel <[email protected]>

* fix

Signed-off-by: zhenwei-intel <[email protected]>

* fix strcasecmp

Signed-off-by: zhenwei-intel <[email protected]>

* store true for use-ggml

Signed-off-by: zhenwei-intel <[email protected]>

* update format

Signed-off-by: zhenwei-intel <[email protected]>

---------

Signed-off-by: zhenwei-intel <[email protected]>

* try aspll spellingcheck (#368)

* [CPP Graph] KV-Update Optimization (#369)

* fix python api bug (#382)

* change mainpage (#340)

* [CPP Graph] Enable llama2-70b (#213)

* add readme for llm kernels (#386)

* add readme for llm kernels

Co-authored-by: VincyZhang <[email protected]>

* Update README.md for llama2 70B (#391)

* Refine LLM runtime readme (#395)

* Refine LLM runtime readme

Signed-off-by: hshen14 <[email protected]>

* Use transfomers tokenizer and streamer for python api (#388)

* [Cpp Graph] Align Cpp Beam Search  (#322)

* not compiling python api in cpp graph by default (#401)

* using AutoModelCausalLM

Signed-off-by: zhenwei-intel <[email protected]>

* not compiling python api of cpp model

Signed-off-by: zhenwei-intel <[email protected]>

---------

Signed-off-by: zhenwei-intel <[email protected]>
Co-authored-by: Dong, Bo <[email protected]>

* [CPP Graph] Falcon MHA support (#422)

* reinit cpp model and infinite text generation (#413)

* [CPP Graph] ChatGLM2 MHA support (#435)

* update post process with num_beams and do_sample (#430)

* use mpt post process

Signed-off-by: zhenwei-intel <[email protected]>

* update jblas (#433)

* pass compilation, before model test.

* upgrade QBits

* update jblas

Co-authored-by: ZheWang <[email protected]>

* fixed the version of transformers (#437)

* [Cpp Graph] Update Falcon HF para and support Falcon-180B (#414)

* [CPP Graph] MPT MHA support (#453)

* [CPP Graph] Baichuan & Baichuan2 Enabling (#376)

* Enable Baichan and Baichuan2 in LLM Runtime

* GitHub Action Workflows speedup (#456)

* workflow speedup

* read special token id from tokenizer (#463)

* read special token id from tokenizer

Signed-off-by: zhenwei-intel <[email protected]>

* gelu support (#424)

Co-authored-by: intellinjun <[email protected]>

* Fix msvc compile issues (#477)

* [Cpp Graph] Beam Search Pybind (model archs: gptj and gptneox) (#449)

* fix post process with topk topp of python api (#476)

* [CPP Graph] Opt qbits dequant (#465)

* [RUNTIME] Enabing streaming llm for Runtime (#501)

* Support StreamingLLM on CPU

Signed-off-by: zhenwei-intel <[email protected]>

* support Avx2 (#493)

* support Memcpy2D

* support gelu fusion

---------

Co-authored-by: luoyu-intel <[email protected]>

* Fix typo in README.md (#516)

convertion -> conversion

Signed-off-by: Ikko Eltociear Ashimine <[email protected]>

* improve Avx2  (#511)

* reduce unnecessary tests (#521)

* update python api readme (#504)

* Update README.md

Signed-off-by: Haihao Shen <[email protected]>

* Update README.md

Signed-off-by: Haihao Shen <[email protected]>

* Update README.md

Signed-off-by: Haihao Shen <[email protected]>

* Update README.md

Signed-off-by: Haihao Shen <[email protected]>

* Update README.md

Signed-off-by: Haihao Shen <[email protected]>

* Revert "update python api readme (#504)"

This reverts commit 5f4175ad754fb2e3c1f0f2f49a5a8356c1c3e170.

* reduce unnessasory tests

Signed-off-by: Wenxin Zhang <[email protected]>

* reduce unnessasory tests

Signed-off-by: Wenxin Zhang <[email protected]>

* reduce unnessasory tests

Signed-off-by: Wenxin Zhang <[email protected]>

* reduce unnessasory tests

Signed-off-by: Wenxin Zhang <[email protected]>

---------

Signed-off-by: Haihao Shen <[email protected]>
Signed-off-by: Wenxin Zhang <[email protected]>
Co-authored-by: liuzhenwei <[email protected]>
Co-authored-by: Haihao Shen <[email protected]>

* [LLM Runtime] update python api readme (#525)

* [LLM Runtime] Baichuan FFN & MHA support (#497)

* [CPP Graph] Fused Attention Doc (#443)

* Add doc for fused attn

* [NeuralChat] Add neuralchat UT for cache and memory (#502)

Add neuralchat UT for cache and memory

Signed-off-by: Liangyx2 <[email protected]>

* [Documentation] upload streaming llm video (#533)

* upload streaming llm video

Signed-off-by: zhenwei-intel <[email protected]>

* support attention block TP and add gptj llama model (#361)

* [LLM Runtime] Enable Mistral-7b (#552)

* [LLM Runtime] Enable Mistral-7b

Signed-off-by: intellinjun <[email protected]>

* Add itrex llm runtime graph int4 notebook (#399)

* [LLM Runtime] Enable interactive mode of python api (#548)

* [LLM Runtime] Streaming-LLM based on shift RoPE (#580)

* [LLM Runtime] enable MHA fusion for gptneox&dolly&starcoder&llama2-70b (#567)

* [Doc] change the structure of llm runtime readme (#596)

* add warning in graph build

* add more info

* Added script of merging peft adapter for quantization of llm with peft (#615)

* added script of merging peft adapter for quantization of llm with peft.

Signed-off-by: Ye, Xinyu <[email protected]>

* Fix bloom ffn fusion (#620)

* [DOC] add LLM Runtime developer document (#609)

* add developer document

Signed-off-by: intellinjun <[email protected]>

* [Document] update llm runtime readme (#623)

* [LLM Runtime] Fix LLaMA after discarding KV-cache (#625)

* [LLM Runtime] Shift-RoPE-based Streaming-LLM for Fused-Attention  (#608)

* sync jblas 6656837

* shift-RoPE with mha

* restrain transformers version (#627)

* [LLM Runtime] integrate AVX_VNNI (#565)

* [LLM Runtime] Multi-Round chat with chatglm2 (#646)

* [LLM Runtime] Unify KV_cache and Support Batch-dim Process in Beam Search (#583)

* [LLM Runtime] Allow CompileBF16 on GCC11 (#655)

* Allow CompileBF16 on GCC11

* fixed bf16 error in convert_llama.py (#661)

* [Doc]add readme (#663)

* add support matrix

* diable bf16 scale for jblas (#662)

Signed-off-by: Hengyu Meng <[email protected]>

* [LLM Runtime]Fix gptneox bug (#671)

Signed-off-by: intellinjun <[email protected]>

* [LLM Runtime] Refine Python API (#665)

* [LLM Runtime] add python api for mistral (#684)

Signed-off-by: intellinjun <[email protected]>

* fix typo : graph_developer_document branch no longer exists (#686)

Signed-off-by: sangjune.park <[email protected]>

* [LLM Runtime] Support load_in_nbit in llm runtime (#688)

* support load_in_nbit in llm runtime

Signed-off-by: zhenwei-intel <[email protected]>

* [LLM Runtime] Update README (#696)

* update readme (#708)

Update LLM runtime readme

* [LLM Runtime] Add Script for PPL Evaluation (#685)

* [LLM Runtime] Optimize tests of llm runtime (#718)

* separate optimize UT and improve UT infra (#729)

* [LLM Runtime] enable qwen graph (#669)

* [LLM Runtime] enable qwen graph

Signed-off-by: intellinjun <[email protected]>

* [LLM Runtime] Enable GPTQ models (#611)

* Enable GPTQ for bloom model

Signed-off-by: zhenwei-intel <[email protected]>

* [LLM Runtime] Add jblas split weight interface and support jblas models (#639)

* [LLM Runtime] Add jblas split weight interface and support jblas models

Signed-off-by: Clark Chin <[email protected]>

* [LLM Runtime] Beam Search Support of Fused Attention (#734)

* Update GPTQ into README (#781)

* Update GPTQ into README

Signed-off-by: Dong, Bo <[email protected]>

* Update README.md

Signed-off-by: Dong, Bo <[email protected]>

---------

Signed-off-by: Dong, Bo <[email protected]>

* fix : max output token (#788)

Signed-off-by: sangjune.park <[email protected]>

* docs : reinforcement llm runtime graph devleoper guide (#786)

Signed-off-by: sangjune.park <[email protected]>

* [LLM Runtime] Check weight dtype and compute dtype (#778)

* [LLM Runtime] Fix develop doc and convert.py (#794)

* fix develop doc and convert.py

Signed-off-by: Yu, Zhentao <[email protected]>

* fix : init_from_bin example (#789)

* [LLM Runtime] Enable whisper new app (#682)

* [Engine] Apply the STS task to bge models (#673)

* [LLM Runtime]fix format (#812)

* [LLM Runtime]  fix added_tokens error (#793)

Signed-off-by: intellinjun <[email protected]>

* Update README.md

Signed-off-by: Haihao Shen <[email protected]>

* update (#823)

Signed-off-by: Dong, Bo1 <[email protected]>

* [Doc] update README for Qwen chat (#808)

* [LLM Runtime] ChatGLM-V1 multi-batch infer and batched greedy search generation (#700)

* [LLM Runtime] Remove use_cache in WOQ (#818)

* make void to char to avoid the unknow size (#856)

Signed-off-by: Dong, Bo1 <[email protected]>

* [Infra] enhance CI scan (#834)

* Fix kernels softmax in int8 mha (#869)

Co-authored-by: kevinintel <[email protected]>

* [LLM Runtime] Baichuan13B inference bug fix (#891)

* Baichuan13B FP32 inference bug fix

* [LLM Runtime] Remove the identical branch (#894)

* [LLM Runtime] make rms_norm_eps and freq_base as parameter (#903)

* [LLM Runtime] refactor itrex backend based on the latest Jblas (#769)

Co-authored-by: luoyu-intel <[email protected]>
Co-authored-by: Ding, Yi1 <[email protected]>
Co-authored-by: zhenwei-intel <[email protected]>
Co-authored-by: yuchengliu1 <[email protected]>
Co-authored-by: Meng, Hengyu <[email protected]>

* [Doc] add gaudi2 in doc (#799)

* [LLM Runtime] Add MX-Format (FP8_E5M2, FP8_E4M3, FP4_E2M1, NF4) (#872)

* add fp8 in llm frontend

Signed-off-by: Yu, Zhentao <[email protected]>

* [LLM Runtime] Fix PPL Test (#937)

* [LLM Runtime] Add MatMul data types combinations table (#945)

* [LLM Runtime] decoupling weight_type and scale_type in Qbits (#940)

* [LLM Runtime] Convert huggingface gptq model to jblas (#927)

Co-authored-by: luoyu-intel <[email protected]>

* reorg directory

migrate CI

Signed-off-by: Hengyu Meng <[email protected]>

refine CI for neuralspeed

Signed-off-by: Wenxin Zhang <[email protected]>

add more CI scripts

Signed-off-by: Wenxin Zhang <[email protected]>

minor fix

Signed-off-by: Wenxin Zhang <[email protected]>

remove runner.name when running on ubuntu-latest

Signed-off-by: Wenxin Zhang <[email protected]>

update CI to share system

Signed-off-by: Wenxin Zhang <[email protected]>

rename jblas tp bestla
directory reorg

Signed-off-by: Hengyu Meng <[email protected]>

remove itrex dependency

Signed-off-by: Hengyu Meng <[email protected]>

fix script path\n remove python dependency

Signed-off-by: Hengyu Meng <[email protected]>

-s

remove python tests

disable percentage

disable monitor

Signed-off-by: Hengyu Meng <[email protected]>

fix naming

fix threadpool conflict

Signed-off-by: Hengyu Meng <[email protected]>

restore percentage

Signed-off-by: Hengyu Meng <[email protected]>

* fix bestla typo
add bestla workflow image

Signed-off-by: Hengyu Meng <[email protected]>

* fix scripts path

* fix pylint and cpplint

---------

Signed-off-by: zhenwei-intel <[email protected]>
Signed-off-by: Yu, Zhentao <[email protected]>
Signed-off-by: hshen14 <[email protected]>
Signed-off-by: Dong, Bo1 <[email protected]>
Signed-off-by: intellinjun <[email protected]>
Signed-off-by: Clark Chin <[email protected]>
Signed-off-by: Ding, Yi1 <[email protected]>
Signed-off-by: Zhenzhong1 <[email protected]>
Signed-off-by: Wenxin Zhang <[email protected]>
Signed-off-by: Ikko Eltociear Ashimine <[email protected]>
Signed-off-by: Haihao Shen <[email protected]>
Signed-off-by: Liangyx2 <[email protected]>
Signed-off-by: Ye, Xinyu <[email protected]>
Signed-off-by: Hengyu Meng <[email protected]>
Signed-off-by: sangjune.park <[email protected]>
Signed-off-by: Dong, Bo <[email protected]>
Co-authored-by: Cheng, Penghui <[email protected]>
Co-authored-by: liuzhenwei <[email protected]>
Co-authored-by: zhentaoyu <[email protected]>
Co-authored-by: Dong, Bo <[email protected]>
Co-authored-by: kevinintel <[email protected]>
Co-authored-by: Haihao Shen <[email protected]>
Co-authored-by: lvliang-intel <[email protected]>
Co-authored-by: Wang, Chang <[email protected]>
Co-authored-by: luoyu-intel <[email protected]>
Co-authored-by: Yi DING <[email protected]>
Co-authored-by: zhenwei-intel <[email protected]>
Co-authored-by: intellinjun <[email protected]>
Co-authored-by: Chen Xi <[email protected]>
Co-authored-by: Zhenzhong1 <[email protected]>
Co-authored-by: CeciliaWwq <[email protected]>
Co-authored-by: Wenxin Zhang <[email protected]>
Co-authored-by: ZheWang <[email protected]>
Co-authored-by: yuchengliu1 <[email protected]>
Co-authored-by: Ikko Eltociear Ashimine <[email protected]>
Co-authored-by: Liangyx2 <[email protected]>
Co-authored-by: XinyuYe-Intel <[email protected]>
Co-authored-by: akarX23 <[email protected]>
Co-authored-by: sangjune.park <[email protected]>
  • Loading branch information
1 parent 49e4e8c commit 09ec939
Show file tree
Hide file tree
Showing 245 changed files with 98,593 additions and 0 deletions.
7 changes: 7 additions & 0 deletions .clang-format
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Language: Cpp
BasedOnStyle: Google
DerivePointerAlignment: false
ColumnLimit: 120
SpaceBeforeParens: ControlStatements
SpaceBeforeRangeBasedForLoopColon: true
SortIncludes: false
12 changes: 12 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
root = true

[*]
charset = utf-8
indent_style = space
indent_size = 2
end_of_line = lf
insert_final_newline = true
trim_trailing_whitespace = true

[*.py]
indent_size = 4
21 changes: 21 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
## Type of Change

feature or bug fix or documentation or others
API changed or not

## Description

detail description
Issues: xxx

## Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

## How has this PR been tested?

how to reproduce the test (including hardware information)

## Dependency Change?

any library dependency introduced or removed
72 changes: 72 additions & 0 deletions .github/workflows/copyright_check.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
name: Copyright Check

on:
pull_request:
branches: [main]
paths:
- neural_speed/**
- setup.py
- .github/workflows/format_scan.yml
workflow_dispatch:

# If there is a new commit, the previous jobs will be canceled
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

env:
CODE_SCAN_LOG_PATH: "${{ github.workspace }}/log"
CONTAINER_NAME: "codeScan"

jobs:
format-scan:
runs-on: ubuntu-latest
strategy:
matrix:
job_name: ["copyright"]
fail-fast: false
steps:

- name: Checkout out Repo
uses: actions/checkout@v3

- name: CopyRight check
run: |
source ${{ github.workspace }}/.github/workflows/scripts/change_color.sh
set -e
mkdir -p ${{ env.CODE_SCAN_LOG_PATH }}
supported_extensions=(py, sh, yaml)
git fetch
git --no-pager diff --name-only remotes/origin/${{ github.base_ref }} ${{ github.workspace }}/neural_speed> ${{ env.CODE_SCAN_LOG_PATH }}/diff.log
files=$(cat ${{ env.CODE_SCAN_LOG_PATH }}/diff.log | awk '!a[$0]++')
$LIGHT_PURPLE && echo " ----------------- checking ... --------------------------" && $RESET
if [[ -f ${{ env.CODE_SCAN_LOG_PATH }}/copyright_issue_summary.log ]]; then
rm -f ${{ env.CODE_SCAN_LOG_PATH }}/copyright_issue_summary.log
fi
for file in ${files}
do
if [[ "${supported_extensions[@]}" =~ "${file##*.}" ]]; then
if [ $(grep -E -c "Copyright \\(c\\) ([0-9]{4})(-[0-9]{4})? Intel Corporation" ${file}) = 0 ]; then
echo ${file} >> ${{ env.CODE_SCAN_LOG_PATH }}/copyright_issue_summary.log
$BOLD_YELLOW && echo " ----------------- Current log file output start --------------------------"
cat ${{ env.CODE_SCAN_LOG_PATH }}/copyright_issue_summary.log
$BOLD_YELLOW && echo " ----------------- Current log file output end --------------------------" && $RESET
$BOLD_RED && echo "CopyRight has something wrong! Please click on the artifact button to download and view the error log!" && $RESET
fi
else
$LIGHT_PURPLE && echo "Skipping ${file}" && $RESET
fi
done
if [[ -f ${{ env.CODE_SCAN_LOG_PATH }}/copyright_issue_summary.log ]]; then
$BOLD_YELLOW && echo " ----------------- Current log file output start --------------------------"
cat ${{ env.CODE_SCAN_LOG_PATH }}/copyright_issue_summary.log
$BOLD_YELLOW && echo " ----------------- Current log file output end --------------------------" && $RESET
$BOLD_RED && echo "CopyRight has something wrong! Please click on the artifact button to download and view the error log!" && $RESET && exit 1
fi
- name: Publish pipeline artifact
if: ${{ failure() }}
uses: actions/upload-artifact@v3
with:
name: ${{ matrix.job_name }}
path: ${{ env.CODE_SCAN_LOG_PATH }}.*
156 changes: 156 additions & 0 deletions .github/workflows/cpp-graph-test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
name: CPP Graph Test

on:
pull_request:
branches: [main]
paths:
- '.github/workflows/cpp-graph-test.yml'
- '.github/workflows/scripts/models/cpp_graph_inference.sh'
- 'neural_speed/**'
- 'bestla/**'
workflow_dispatch:
inputs:
compiler_version:
description: 'compiler_version'
required: false
type: string
default: '13.1.0'
models:
description: 'models (in json)'
required: false
type: string
default: '["llama-2-7b-chat", "gptj-6b"]'
runner:
description: 'runner'
required: false
type: string
default: 'spr'

# If there is a new commit, the previous jobs will be canceled
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

env:
OUT_SCRIPT_PATH: ${{ github.workspace }}/.github/workflows/scripts/models
SCRIPT_PATH: ${{ github.workspace }}/.github/workflows/scripts
WORKING_DIR: ${{ github.workspace }}
CONTAINER_NAME: "codeScan"
INPUT_COMPILER_VERSION: ${{ inputs.compiler_version || '13.1.0' }}

jobs:
CPP-Graph-Workflow:
runs-on: ${{inputs.runner || 'spr'}}
strategy:
matrix:
modelName: ${{fromJson(inputs.models || '["llama-2-7b-chat", "gptj-6b"]')}}
steps:
- name: Checkout out Repo
uses: actions/checkout@v3
with:
submodules: "recursive"
fetch-tags: true

- name: Env build
run: |
bash ${{ github.workspace }}/.github/workflows/scripts/prepare_env_with_conda.sh "cpp-graph-test" "3.8"
- name: Binary build
if: 0 == 1
run: |
cd ${{ github.workspace }}
conda activate cpp-graph-test || source activate cpp-graph-test
pip install build --upgrade
pip install -r requirements.txt
python setup.py sdist bdist_wheel
pip install dist/neuralspeed*.whl
pip list
- name: BF16 Benchmark
run: |
cd ${{ github.workspace }}/.github/workflows/scripts/models
bash cpp_graph_inference.sh cpp-graph-test ${{ matrix.modelName }} ${{ env.INPUT_COMPILER_VERSION }}
- name: Rename summary
run: |
cd ${{ github.workspace }}
cp cpp_graph_summary.log cpp_graph_summary_${{matrix.modelName}}.log
- name: Publish pipeline artifact
uses: actions/upload-artifact@v3
if: ${{ !cancelled() }}
with:
name: cpp_graph
path: ${{ github.workspace }}/cpp_graph_summary_${{matrix.modelName}}.log
if-no-files-found: ignore # 'warn' or 'ignore' are also available, defaults to `warn`
retention-days: 60 # 1 <= retention-days <= 90

Genreate-Report:
runs-on: ubuntu-latest
needs: [CPP-Graph-Workflow]
steps:
- name: Docker Clean Up
run: |
docker ps -a
if [[ $(docker ps -a | grep -i '${{ env.CONTAINER_NAME }}-${{ runner.name }}'$) ]]; then
docker start ${{ env.CONTAINER_NAME }}-${{ runner.name }}
echo "remove left files through container ..."
docker exec ${{ env.CONTAINER_NAME }}-${{ runner.name }} bash -c "ls -a /neural-speed && rm -fr /neural-speed/* && rm -fr /neural-speed/.* || true"
fi
- name: Checkout out Repo
uses: actions/checkout@v3

- name: Download Summary Log
uses: actions/download-artifact@v3
with:
path: ${{ env.OUT_SCRIPT_PATH }}/generated/log

- name: Merge CPP Graph Summary Log
run: |
cd ${{ env.OUT_SCRIPT_PATH }}/generated/log/cpp_graph
for summary in $(find . -name "cpp_graph_summary_*.log"); do cat $summary >> cpp_graph_summary.log; done
- name: Download Reference Artifact
id: download-artifact
uses: dawidd6/action-download-artifact@v2
with:
workflow: cpp-graph-test.yml
name: FinalReport
run_id: ${{ vars.GRAPH_REF_ID }}
path: ${{ env.OUT_SCRIPT_PATH }}
name_is_regexp: true
repo: ${{ github.repository }}
check_artifacts: false
search_artifacts: false
skip_unpack: false
if_no_artifact_found: warn

- name: Display structure of downloaded files
run: cd ${{ env.OUT_SCRIPT_PATH }} && ls -R

- name: Generate report
run: |
echo "------ Generating final report.html ------"
cd ${{ env.OUT_SCRIPT_PATH }}
/usr/bin/bash generate_report.sh --workflow=deploy
sed -n '/<body>/,/<\/body>/p' generated/report.html | sed -r '/^$/d' | sed -r 's/^ +//g' >> $GITHUB_STEP_SUMMARY
env:
RUN_DISPLAY_URL: https://github.com/neural-speed/actions/runs/${{ github.run_id }}
BUILD_NUMBER: ${{ github.run_id }}
JOB_STATUS: succeed
MR_source_branch: ${{ github.head_ref }}
ghprbActualCommit: ${{ github.event.pull_request.head.sha }}

- name: Publish Report
uses: actions/upload-artifact@v3
if: ${{ !cancelled() }}
with:
name: FinalReport
path: ${{ env.OUT_SCRIPT_PATH }}/generated

- name: Specify performance regression
run: |
if [ $(is_perf_reg) == 'true' ]; then
echo "[Performance Regression] Some model performance regression occurred, please check artifacts and reports."
exit 1
fi
41 changes: 41 additions & 0 deletions .github/workflows/docker/codeScan.dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
#
# Copyright (c) 2022 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

ARG UBUNTU_VER=22.04
FROM ubuntu:${UBUNTU_VER} as devel

# See http://bugs.python.org/issue19846
ENV LANG C.UTF-8

RUN apt-get update && apt-get install -y --no-install-recommends --fix-missing \
aspell \
aspell-en \
python3 \
python3-pip \
python3-dev \
python3-distutils \
build-essential \
cloc \
python3.10-venv \
git

RUN ln -sf $(which python3) /usr/bin/python

RUN python -m pip install --no-cache-dir pylint==2.17.5\
bandit==1.7.4\
pyspelling\
pydocstyle

WORKDIR /
48 changes: 48 additions & 0 deletions .github/workflows/docker/devel.dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
#
# Copyright (c) 2022 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
ARG UBUNTU_VER=22.04
FROM ubuntu:${UBUNTU_VER} as devel

# See http://bugs.python.org/issue19846
ENV LANG C.UTF-8

RUN apt-get update && apt-get install -y --no-install-recommends --fix-missing \
python3 \
python3-pip \
python3-dev \
python3-distutils \
autoconf \
build-essential \
git \
libgl1-mesa-glx \
libglib2.0-0 \
numactl \
time \
wget \
bc \
gawk \
jq \
python3.10-venv \
vim

RUN ln -sf $(which python3) /usr/bin/python

RUN python -m pip --no-cache-dir install --upgrade pip
RUN python -m pip install --no-cache-dir setuptools

RUN pip list

WORKDIR /

Loading

0 comments on commit 09ec939

Please sign in to comment.