[python] add rolling batch as auto for neuron smart default #2606

sindhuvahinis · 2024-11-22T20:22:56Z

Description

This PR aims to fixes the Neo failures for this model configuration.

Line 1345 in 255408e

"llama-3.1-8b": {

During partition, since no rolling_batch option is specified, the default setting of disable is used, and the model is compiled accordingly.
During inference, Java side, has LMI smart default capability, which assigns the rolling batch option to auto.
Since compiled configuration is different from inference configuration, the hash values of the neff files are different, hence it fails.

In this PR, we are setting rolling batch as auto in smart defaults, so when partitioning, the default becomes auto for option.rolling_batch
We recently added validations that input should be string for rolling batch. In our test client, if the batch size is 1, then sending the input as string.