v1.3.x release: HTTP 424 when requesting top_n_tokens > 0 #1340

CaveSven · 2023-12-13T12:42:21Z

System Info

Official docker container, v1.3.x

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

Steps to reproduce:

Launch docker container
Navigate to swagger api
Send vanilla example request for POST /generate (top_n_tokens > 0) and observe HTTP 424

{
  "error": "Request failed during generation: Server error: Argument for field generate.v2.Generation.top_tokens is not iterable",
  "error_type": "generation"
}

Expected behavior

HTTP 200

The text was updated successfully, but these errors were encountered:

shivanandmn · 2023-12-21T08:42:14Z

i am still facing this issue, top_n_tokens = None is works fine, but if you any int value it throw error

Soumendraprasad · 2023-12-23T09:21:17Z

@CaveSven , @shivanandmn did you find any way to solve it . I am also getting this error . Let docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.3 --model-id $model , it is the command . Where to make changes .

shivanandmn · 2023-12-23T09:28:02Z

while calling the api just add top_n_tokens=None

SvenRohrTNG · 2023-12-31T10:35:16Z

Hi @Narsil,
I think the offending change is the newly added "repeated" keyword in the generate.proto, see 9ecfa16#r135971436
If I remove it, everything is as expected.

SvenRohrTNG · 2024-01-01T17:18:25Z

I would be happy to contribute a fix, but after looking through the code I wonder why 9ecfa16#diff-b1fa6727513f9184386a8371f182d51c9ccd6d131d61be6218170a6e2444146aR527 is laid out to handle multiple tokens in one single generation: as far as I can see, the python backend always just sends a single token per generation (e.g. see 9ecfa16#diff-f16065c0ba908e84102217a2c5d97bf724feed593985d7cbed77f6778a12a772R705)

Fix issue huggingface#1340

greg-us · 2024-01-20T20:38:30Z

I think I found a way to solve this problem.

In PR1308, proto generate.v2 was introduced. Token details are replaced by the Tokens message, and TopTokens is replaced by the repeated Tokens message.

The outputs of server models evolved accordingly, but only partially. Even if a single token is generated, a list is returned in a Tokens class, whereas top_tokens didn't adapt, as seen here.

I have proposed a simple fix, returning a list of top_tokens from generation and thus a repeated Token message for protobuf.

CaveSven changed the title ~~v1.3.x release: 424 when requesting top_n_tokens > 0~~ v1.3.x release: HTTP 424 when requesting top_n_tokens > 0 Dec 13, 2023

SvenRohrTNG referenced this issue Dec 31, 2023

Speculative (#1308)

9ecfa16

greg-us pushed a commit to greg-us/tgi-fix-1340 that referenced this issue Jan 19, 2024

Fix top_n_tokens > 0

fd8b426

Fix issue huggingface#1340

greg-us mentioned this issue Jan 19, 2024

#1340 fix top_n_tokens > 0 #1459

Closed

5 tasks

SvenRohrTNG mentioned this issue Feb 7, 2024

Request fails when adding the top_n_tokens parameter #1400

Closed

4 tasks

CaveSven closed this as completed Feb 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.3.x release: HTTP 424 when requesting top_n_tokens > 0 #1340

v1.3.x release: HTTP 424 when requesting top_n_tokens > 0 #1340

CaveSven commented Dec 13, 2023

shivanandmn commented Dec 21, 2023

Soumendraprasad commented Dec 23, 2023

shivanandmn commented Dec 23, 2023

SvenRohrTNG commented Dec 31, 2023

SvenRohrTNG commented Jan 1, 2024

greg-us commented Jan 20, 2024

v1.3.x release: HTTP 424 when requesting top_n_tokens > 0 #1340

v1.3.x release: HTTP 424 when requesting top_n_tokens > 0 #1340

Comments

CaveSven commented Dec 13, 2023

System Info

Information

Tasks

Reproduction

Expected behavior

shivanandmn commented Dec 21, 2023

Soumendraprasad commented Dec 23, 2023

shivanandmn commented Dec 23, 2023

SvenRohrTNG commented Dec 31, 2023

SvenRohrTNG commented Jan 1, 2024

greg-us commented Jan 20, 2024