Skip to content
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.

[Neural Speed] Fix ret when ignore_prompt #278

Merged
merged 1 commit into from
Jun 3, 2024
Merged

[Neural Speed] Fix ret when ignore_prompt #278

merged 1 commit into from
Jun 3, 2024

Conversation

zhentaoyu
Copy link
Contributor

Type of Change

feature or bug fix or documentation or others
API changed or not

Description

detail description
Issues: xxx

  • fix ret when ignore_prompt
  • add cont-batching benchmark throughput in doc

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

Signed-off-by: Yu, Zhentao <[email protected]>
@zhentaoyu zhentaoyu added the ready to review Ready to review label Jun 3, 2024
Copy link
Contributor

@a32543254 a32543254 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@a32543254
Copy link
Contributor

we may add some recommend config for benchmark to reach max throughputs
like instance and batch size

@zhentaoyu
Copy link
Contributor Author

we may add some recommend config for benchmark to reach max throughputs like instance and batch size

It depends. We can maintain a table after we do more experiments on different machines (SPR, client, generation ways, first token length, etc.)

@a32543254 a32543254 merged commit acfbc40 into main Jun 3, 2024
14 of 15 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
ready to review Ready to review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants