This repository has been archived by the owner on Aug 30, 2024. It is now read-only.

[Neural Speed] Support continuous batching + beam search inference in LLAMA #428

Job	Run time
CPP-Graph-Workflow (llama-2-7b-chat)	1s
CPP-Graph-Workflow (gptj-6b)	1s
Genreate-Report	1s
	3s

Provide feedback