-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.3.x release: HTTP 424 when requesting top_n_tokens > 0 #1340
Comments
@CaveSven , @shivanandmn did you find any way to solve it . I am also getting this error . Let |
while calling the api just add |
Hi @Narsil, |
I would be happy to contribute a fix, but after looking through the code I wonder why 9ecfa16#diff-b1fa6727513f9184386a8371f182d51c9ccd6d131d61be6218170a6e2444146aR527 is laid out to handle multiple tokens in one single generation: as far as I can see, the python backend always just sends a single token per generation (e.g. see 9ecfa16#diff-f16065c0ba908e84102217a2c5d97bf724feed593985d7cbed77f6778a12a772R705) |
Fix issue huggingface#1340
I think I found a way to solve this problem. In PR1308, proto generate.v2 was introduced. Token details are replaced by the Tokens message, and TopTokens is replaced by the repeated Tokens message. The outputs of server models evolved accordingly, but only partially. Even if a single token is generated, a list is returned in a Tokens class, whereas top_tokens didn't adapt, as seen here. I have proposed a simple fix, returning a list of top_tokens from generation and thus a repeated Token message for protobuf. |
System Info
Official docker container, v1.3.x
Information
Tasks
Reproduction
Steps to reproduce:
POST /generate
(top_n_tokens > 0) and observe HTTP 424Expected behavior
HTTP 200
The text was updated successfully, but these errors were encountered: