Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BugFix: Fixed input to llama_vision processor #431

Merged
merged 1 commit into from
Nov 29, 2024

Conversation

Danielohayon
Copy link
Contributor

@Danielohayon Danielohayon commented Nov 28, 2024

BugFix: llama_vision processor accepts "text" key and not "content" key.

Problem Description

In the current implementation the processor accepts the messages variable with a "content" field:
messages[-1]["content"].append({"type": "text", "content": contexts})
But this is not the correct format for llama vision, and for example when this is the messages variable:

(Pdb) messages
[{'role': 'user', 'content': [{'type': 'image'}, {'type': 'text', 'content': "<image 1> Baxter Company has a relevant range of production between 15,000 and 30,000 units. The following cost data represents average variable costs per unit for 25,000 units of production. If 30,000 units are produced, what are the per unit manufacturing overhead costs incurred?\nA. $6\nB. $7\nC. $8\nD. $9\n\nAnswer with the option's letter from the given choices directly."}]}]

The resulting prompt after running prompt = self.processor.apply_chat_template(messages, add_generation_prompt=True) does not contain the contexts:

(Pdb) prompt
'<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n<|image|><|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n'

Solution

The fix is easy, following the llama vision example from https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct, the format of the input to the apply_chat_template function should be with a "text" key instead of a "content" key.
From the model documentation:

messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": "If I had to write a haiku for this one, it would be: "}
    ]}
]
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)

After fixing this the resulting prompt for the same messages from before is:

(Pdb) prompt
"<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n<|image|><image 1> Baxter Company has a relevant range of production between 15,000 and 30,000 units. The following cost data represents average variable costs per unit for 25,000 units of production. If 30,000 units are produced, what are the per unit manufacturing overhead costs incurred?\nA. $6\nB. $7\nC. $8\nD. $9\n\nAnswer with the option's letter from the given choices directly.<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"

BugFix: llama_vision processor accepts "text" key and not "content" key.
@Luodian Luodian merged commit dd2839e into EvolvingLMMs-Lab:main Nov 29, 2024
1 check passed
ZhaoCinyu pushed a commit to ZhaoCinyu/lmms-eval that referenced this pull request Dec 9, 2024
BugFix: llama_vision processor accepts "text" key and not "content" key.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants