Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] add rolling batch as auto for neuron smart default #2606

Merged
merged 1 commit into from
Nov 27, 2024

Conversation

sindhuvahinis
Copy link
Contributor

Description

This PR aims to fixes the Neo failures for this model configuration.

"llama-3.1-8b": {

  • During partition, since no rolling_batch option is specified, the default setting of disable is used, and the model is compiled accordingly.
  • During inference, Java side, has LMI smart default capability, which assigns the rolling batch option to auto.
    Since compiled configuration is different from inference configuration, the hash values of the neff files are different, hence it fails.

Fix:

  • In this PR, we are setting rolling batch as auto in smart defaults, so when partitioning, the default becomes auto for option.rolling_batch
  • We recently added validations that input should be string for rolling batch. In our test client, if the batch size is 1, then sending the input as string.

Testing:

  • Tested the neo use case manually in my EC2 machine.

Next:

@sindhuvahinis sindhuvahinis requested review from zachgk and a team as code owners November 22, 2024 20:22
@sindhuvahinis sindhuvahinis merged commit 5a6562a into deepjavalibrary:master Nov 27, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants