Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.3.x release: HTTP 424 when requesting top_n_tokens > 0 #1340

Closed
2 of 4 tasks
CaveSven opened this issue Dec 13, 2023 · 6 comments
Closed
2 of 4 tasks

v1.3.x release: HTTP 424 when requesting top_n_tokens > 0 #1340

CaveSven opened this issue Dec 13, 2023 · 6 comments

Comments

@CaveSven
Copy link

System Info

Official docker container, v1.3.x

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Steps to reproduce:

  1. Launch docker container
  2. Navigate to swagger api
  3. Send vanilla example request for POST /generate (top_n_tokens > 0) and observe HTTP 424
{
  "error": "Request failed during generation: Server error: Argument for field generate.v2.Generation.top_tokens is not iterable",
  "error_type": "generation"
}

Expected behavior

HTTP 200

@CaveSven CaveSven changed the title v1.3.x release: 424 when requesting top_n_tokens > 0 v1.3.x release: HTTP 424 when requesting top_n_tokens > 0 Dec 13, 2023
@shivanandmn
Copy link

i am still facing this issue, top_n_tokens = None is works fine, but if you any int value it throw error
image

@Soumendraprasad
Copy link

@CaveSven , @shivanandmn did you find any way to solve it . I am also getting this error . Let docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.3 --model-id $model , it is the command . Where to make changes .

@shivanandmn
Copy link

while calling the api just add top_n_tokens=None

@SvenRohrTNG
Copy link

Hi @Narsil,
I think the offending change is the newly added "repeated" keyword in the generate.proto, see 9ecfa16#r135971436
If I remove it, everything is as expected.

@SvenRohrTNG
Copy link

I would be happy to contribute a fix, but after looking through the code I wonder why 9ecfa16#diff-b1fa6727513f9184386a8371f182d51c9ccd6d131d61be6218170a6e2444146aR527 is laid out to handle multiple tokens in one single generation: as far as I can see, the python backend always just sends a single token per generation (e.g. see 9ecfa16#diff-f16065c0ba908e84102217a2c5d97bf724feed593985d7cbed77f6778a12a772R705)

greg-us pushed a commit to greg-us/tgi-fix-1340 that referenced this issue Jan 19, 2024
@greg-us
Copy link

greg-us commented Jan 20, 2024

I think I found a way to solve this problem.

In PR1308, proto generate.v2 was introduced. Token details are replaced by the Tokens message, and TopTokens is replaced by the repeated Tokens message.

The outputs of server models evolved accordingly, but only partially. Even if a single token is generated, a list is returned in a Tokens class, whereas top_tokens didn't adapt, as seen here.

I have proposed a simple fix, returning a list of top_tokens from generation and thus a repeated Token message for protobuf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants