Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
separate prefill into a process (intel-analytics#11787)
* seperate prefill into a process * using model.share_memory() * might work * worked * use long prompt * refactor * cleanup * fix bug * clean up * changable inter and intra process stages * refactor * add max output len * fix npu_model changes that may cause generate down * fix npu_model generate import error * fix generare forward error --------- Co-authored-by: sgwhat <[email protected]>
- Loading branch information