Error "Failed to buffer the request body: length limit exceeded" when supplying base64 encoded images greater than 1MB in prompt #1802

akowalsk · 2024-04-24T20:09:17Z

System Info

text-generation-launcher --env

Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.75.0
Commit sha: 2d0a7173d4891e7cd5f9b77f8e0987b82a339e51
Docker label: sha-2d0a717
nvidia-smi:
Wed Apr 24 19:58:49 2024
   +-----------------------------------------------------------------------------------------+
   | NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
   |-----------------------------------------+------------------------+----------------------+
   | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
   |                                         |                        |               MIG M. |
   |=========================================+========================+======================|
   |   0  NVIDIA GeForce RTX 3090        On  |   00000000:01:00.0 Off |                  N/A |
   |  0%   23C    P8             16W /  350W |    8450MiB /  24576MiB |      0%      Default |
   |                                         |                        |                  N/A |
   +-----------------------------------------+------------------------+----------------------+
   |   1  NVIDIA GeForce RTX 3090        On  |   00000000:21:00.0 Off |                  N/A |
   |  0%   25C    P8             20W /  350W |    8418MiB /  24576MiB |      0%      Default |
   |                                         |                        |                  N/A |
   +-----------------------------------------+------------------------+----------------------+

model info

{
  "model_id": "/opt/ml/checkpoint/llava-v1.6-mistral-7b-hf",
  "model_sha": null,
  "model_dtype": "torch.float16",
  "model_device_type": "cuda",
  "model_pipeline_tag": null,
  "max_concurrent_requests": 128,
  "max_best_of": 2,
  "max_stop_sequences": 4,
  "max_input_length": 24576,
  "max_total_tokens": 32768,
  "waiting_served_ratio": 1.2,
  "max_batch_total_tokens": 65536,
  "max_waiting_tokens": 20,
  "max_batch_size": null,
  "validation_workers": 2,
  "max_client_batch_size": 4,
  "version": "2.0.1",
  "sha": "2d0a7173d4891e7cd5f9b77f8e0987b82a339e51",
  "docker_label": "sha-2d0a717"
}

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

Use an image that is greater than 1MB, set IMAGE_PATH and API_ENDPOINT appropriately:

from PIL import Image
import requests
import base64
from io import BytesIO

# fetch image
image = Image.open(IMAGE_PATH)

# Convert the image to a base64 string
buffer = BytesIO()
image.save(buffer, format="PNG")  # Use the appropriate format (e.g., JPEG, PNG)
base64_image = base64.b64encode(buffer.getvalue()).decode('utf-8')

# format image string 
image_string = f"data:image/png;base64,{base64_image}"
query = "Describe the image?"
prompt=f"[INST] ![]({image_string})\n{query} [/INST]"

headers = {
	"Accept" : "application/json",
	"Content-Type": "application/json" 
}

payload = {"inputs":prompt}
response = requests.post(f"{API_ENDPOINT}/generate", headers=headers, json=payload)
try:
    print(response.json())
except:
    print(response.text)

this will print : Failed to buffer the request body: length limit exceeded

If using an image less than 1MB, it generates correctly.

Expected behavior

It should generate text for the image as long as it fits within the model's context. Based on the text of the error, it looks like it has something to do with the default body size in Axum based on the similarity to tokio-rs/axum#1652.

The text was updated successfully, but these errors were encountered:

ktrapeznikov · 2024-04-25T15:30:50Z

Likely related to this #1777

akowalsk · 2024-04-25T15:35:31Z

I've also encountered that problem, but the length limit exceeded thing also happens on the idefics-9b-instruct model. That model works with images of varying dimensionality, but still fails when the image is large (over 1MB).

github-actions · 2024-05-26T01:49:36Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

akowalsk · 2024-05-30T21:12:38Z

I will revalidate on the latest TGI version shortly.

akowalsk · 2024-06-21T20:41:19Z

I tried this again with the latest version and the idefics-8b-chatty model instead of the llava model and the issue persists.

Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.78.0
Commit sha: f426a3398d12808f20c101487329e563d32bfbaf
Docker label: sha-f426a33
nvidia-smi:
Fri Jun 21 20:35:18 2024
   +-----------------------------------------------------------------------------------------+
   | NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
   |-----------------------------------------+------------------------+----------------------+
   | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
   |                                         |                        |               MIG M. |
   |=========================================+========================+======================|
   |   0  NVIDIA GeForce RTX 3090        On  |   00000000:01:00.0 Off |                  N/A |
   |  0%   30C    P8             18W /  350W |   15380MiB /  24576MiB |      0%      Default |
   |                                         |                        |                  N/A |
   +-----------------------------------------+------------------------+----------------------+
   |   1  NVIDIA GeForce RTX 3090        On  |   00000000:21:00.0 Off |                  N/A |
   |  0%   30C    P8             22W /  350W |   15380MiB /  24576MiB |      0%      Default |
   |                                         |                        |                  N/A |
   +-----------------------------------------+------------------------+----------------------+

   +-----------------------------------------------------------------------------------------+
   | Processes:                                                                              |
   |  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
   |        ID   ID                                                               Usage      |
   |=========================================================================================|
   +-----------------------------------------------------------------------------------------+

model info

{
	"model_id": "/opt/ml/checkpoint/idefics2-8b-chatty",
	"model_sha": null,
	"model_dtype": "torch.float16",
	"model_device_type": "cuda",
	"model_pipeline_tag": null,
	"max_concurrent_requests": 128,
	"max_best_of": 2,
	"max_stop_sequences": 4,
	"max_input_length": 24576,
	"max_total_tokens": 32768,
	"waiting_served_ratio": 0.3,
	"max_batch_total_tokens": 192080,
	"max_waiting_tokens": 20,
	"max_batch_size": null,
	"validation_workers": 2,
	"max_client_batch_size": 4,
	"router": "text-generation-router",
	"version": "2.0.4",
	"sha": "f426a3398d12808f20c101487329e563d32bfbaf",
	"docker_label": "sha-f426a33"
}

github-actions · 2024-07-22T01:54:02Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

akowalsk · 2024-07-25T17:53:57Z

I tried to replicate this on the latest TGI version (2.2) and ended up with a different error:

{"timestamp":"2024-07-25T17:50:30.156102Z","level":"ERROR","message":"Server error: 'Tensor' object has no attribute 'input_lengths'","target":"text_generation_client","filename":"router/client/src/lib.rs","line_number":46,"span":{"size":1,"name":"decode"},"spans":[{"batch_size":1,"name":"batch"},{"name":"decode"},{"size":1,"name":"decode"},{"size":1,"name":"decode"}]}
{"timestamp":"2024-07-25T17:50:30.149213Z","level":"ERROR","fields":{"message":"Method Decode encountered an error.\nTraceback (most recent call last):\n File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n sys.exit(app())\n File \"/opt/conda/lib/python3.10/site-packages/typer/main.py\", line 309, in __call__\n return get_command(self)(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 1157, in __call__\n return self.main(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/typer/core.py\", line 723, in main\n return _main(\n File \"/opt/conda/lib/python3.10/site-packages/typer/core.py\", line 193, in _main\n rv = self.invoke(ctx)\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 1688, in invoke\n return _process_result(sub_ctx.command.invoke(sub_ctx))\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 1434, in invoke\n return ctx.invoke(self.callback, **ctx.params)\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 783, in invoke\n return __callback(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/typer/main.py\", line 692, in wrapper\n return callback(**use_params)\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py\", line 118, in serve\n server.serve(\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py\", line 297, in serve\n asyncio.run(\n File \"/opt/conda/lib/python3.10/asyncio/runners.py\", line 44, in run\n return loop.run_until_complete(main)\n File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 636, in run_until_complete\n self.run_forever()\n File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 603, in run_forever\n self._run_once()\n File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 1909, in _run_once\n handle._run()\n File \"/opt/conda/lib/python3.10/asyncio/events.py\", line 80, in _run\n self._context.run(self._callback, *self._args)\n File \"/opt/conda/lib/python3.10/site-packages/grpc_interceptor/server.py\", line 165, in invoke_intercept_method\n return await self.intercept(\n> File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/interceptor.py\", line 21, in intercept\n return await response\n File \"/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py\", line 120, in _unary_interceptor\n raise error\n File \"/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py\", line 111, in _unary_interceptor\n return await behavior(request_or_iterator, context)\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py\", line 183, in Decode\n generations, next_batch, timings = self.model.generate_token(batch)\n File \"/opt/conda/lib/python3.10/contextlib.py\", line 79, in inner\n return func(*args, **kwds)\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py\", line 1376, in generate_token\n out, speculative_logits = self.forward(batch, adapter_data)\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/vlm_causal_lm.py\", line 351, in forward\n logits, speculative_logits = self.model.forward(\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/idefics2.py\", line 824, in forward\n hidden_states = self.text_model.model(\n File \"/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py\", line 1532, in _wrapped_call_impl\n return self._call_impl(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py\", line 1541, in _call_impl\n return forward_call(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_mistral_modeling.py\", line 447, in forward\n hidden_states, residual = layer(\n File \"/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py\", line 1532, in _wrapped_call_impl\n return self._call_impl(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py\", line 1541, in _call_impl\n return forward_call(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_mistral_modeling.py\", line 372, in forward\n attn_output = self.self_attn(\n File \"/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py\", line 1532, in _wrapped_call_impl\n return self._call_impl(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py\", line 1541, in _call_impl\n return forward_call(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_mistral_modeling.py\", line 235, in forward\n attn_output = paged_attention(\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/layers/attention/cuda.py\", line 116, in paged_attention\n input_lengths = seqlen.input_lengths\nAttributeError: 'Tensor' object has no attribute 'input_lengths'"},"target":"text_generation_launcher"}

github-actions · 2024-08-25T01:57:59Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

akowalsk · 2024-08-26T14:27:16Z

Still experiencing the issue.

giladd123 · 2024-09-19T11:57:42Z

Also experiencing this issue when running with this model.

akowalsk · 2024-12-11T17:37:24Z

The addition of the --payload-limit option in 3.0 fixed this issue.

github-actions bot added the Stale label May 26, 2024

github-actions bot removed the Stale label May 31, 2024

github-actions bot added the Stale label Jul 22, 2024

github-actions bot removed the Stale label Jul 26, 2024

github-actions bot added the Stale label Aug 25, 2024

github-actions bot removed the Stale label Aug 27, 2024

OlivierDehaene mentioned this issue Nov 5, 2024

feat: add payload limit #2726

Merged

akowalsk closed this as completed Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error "Failed to buffer the request body: length limit exceeded" when supplying base64 encoded images greater than 1MB in prompt #1802

Error "Failed to buffer the request body: length limit exceeded" when supplying base64 encoded images greater than 1MB in prompt #1802

akowalsk commented Apr 24, 2024

ktrapeznikov commented Apr 25, 2024

akowalsk commented Apr 25, 2024

github-actions bot commented May 26, 2024

akowalsk commented May 30, 2024

akowalsk commented Jun 21, 2024

github-actions bot commented Jul 22, 2024

akowalsk commented Jul 25, 2024

github-actions bot commented Aug 25, 2024

akowalsk commented Aug 26, 2024

giladd123 commented Sep 19, 2024 •

edited

Loading

akowalsk commented Dec 11, 2024

Error "Failed to buffer the request body: length limit exceeded" when supplying base64 encoded images greater than 1MB in prompt #1802

Error "Failed to buffer the request body: length limit exceeded" when supplying base64 encoded images greater than 1MB in prompt #1802

Comments

akowalsk commented Apr 24, 2024

System Info

Information

Tasks

Reproduction

Expected behavior

ktrapeznikov commented Apr 25, 2024

akowalsk commented Apr 25, 2024

github-actions bot commented May 26, 2024

akowalsk commented May 30, 2024

akowalsk commented Jun 21, 2024

github-actions bot commented Jul 22, 2024

akowalsk commented Jul 25, 2024

github-actions bot commented Aug 25, 2024

akowalsk commented Aug 26, 2024

giladd123 commented Sep 19, 2024 • edited Loading

akowalsk commented Dec 11, 2024

giladd123 commented Sep 19, 2024 •

edited

Loading