This repository has been archived by the owner on Aug 30, 2024. It is now read-only.
[Neural Speed] Support continuous batching + beam search inference in LLAMA #413
Job | Run time |
---|---|
2m 10s | |
1m 33s | |
2m 18s | |
1m 12s | |
3m 32s | |
10m 45s |