Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen2-VL Batch Bug #2495

Open
2 of 4 tasks
LugerW-A opened this issue Nov 25, 2024 · 1 comment
Open
2 of 4 tasks

Qwen2-VL Batch Bug #2495

LugerW-A opened this issue Nov 25, 2024 · 1 comment
Labels
bug Something isn't working triaged Issue has been triaged by maintainers

Comments

@LugerW-A
Copy link

LugerW-A commented Nov 25, 2024

System Info

x86
Tensorrt_LLM 0.16.0

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Qwen2-VL examples

Expected behavior

Dose Qwen2-VL support batch prompt?
When the input is a batch, only the first result returns correctly, while the rest are all empty.
print(input_ids.shape)
print(prompt_table.shape)
print(prompt_tasks)
outputs = self.model.generate(
input_ids,
input_position_ids=None,
mrope_params=mrope_params,
sampling_config=None,
prompt_table=prompt_table,
prompt_tasks=prompt_tasks,
max_new_tokens=max_new_tokens,
end_id=end_id,
pad_id=self.model.tokenizer.pad_token_id
if self.model.tokenizer.pad_token_id is not None else
self.model.tokenizer.all_special_ids[0],
top_k=self.args.top_k,
top_p=self.args.top_p,
temperature=self.args.temperature,
repetition_penalty=self.args.repetition_penalty,
num_beams=self.args.num_beams,
output_sequence_lengths=True,
return_dict=True)

actual behavior

input_ids only differ in the first dimension, but the results are incorrect(empty).

additional notes

none

@LugerW-A LugerW-A added the bug Something isn't working label Nov 25, 2024
@hello-11 hello-11 added the triaged Issue has been triaged by maintainers label Nov 25, 2024
@sunnyqgg
Copy link
Collaborator

Hi @LugerW-A , it supports batch inference, and you need to follow the batch process provided by official QWen2-VL, please see more info at: https://github.com/QwenLM/Qwen2-VL?tab=readme-ov-file , like:
messages1 = [
{
"role": "user",
"content": [
{"type": "image", "image": "xxx/image1.jpg"},
{"type": "text", "text": "Describe this picture?"},
],
}
]
messages2 = [
{
"role": "user",
"content": [
{"type": "image", "image": "xxxx/image2.jpg"},
{"type": "text", "text": "Describe this picture? and what kind of coulor doese it containe?"},
],
}
]
messages = [messages1, messages2]
texts = [
processor.apply_chat_template(msg, tokenize=False, add_generation_prompt=True)
for msg in messages
]
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=texts,
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
)
inputs = inputs.to("cuda")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants