Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Refactor CPU llama inference code (huggingface#728)
* ipex 2.3 released * refactor IPEXLlamaAttention * change to Ref * remove Ref * skip tests * skip tests * skip testing without pkv * add tests skip * only llama2 with at least 64 head size support IAKV * cannot assert same outputs cause do_sample=True * rm tiny-llama model testing cause it not work for IAKV * fix code style * refine docstring * fix duplicted code * refactor attention forward * add use_cache for rope * use with and without cache * refine code * add reference link * bug fix * use reshape * Apply suggestions from code review Co-authored-by: Ella Charlaix <[email protected]> * fix --------- Co-authored-by: jiqing-feng <[email protected]> Co-authored-by: Ella Charlaix <[email protected]>
- Loading branch information