BugFix: Fixed input to llama_vision processor #431
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
BugFix: llama_vision processor accepts "text" key and not "content" key.
Problem Description
In the current implementation the processor accepts the messages variable with a "content" field:
messages[-1]["content"].append({"type": "text", "content": contexts})
But this is not the correct format for llama vision, and for example when this is the messages variable:
The resulting prompt after running
prompt = self.processor.apply_chat_template(messages, add_generation_prompt=True)
does not contain the contexts:Solution
The fix is easy, following the llama vision example from https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct, the format of the input to the
apply_chat_template
function should be with a"text"
key instead of a"content"
key.From the model documentation:
After fixing this the resulting prompt for the same
messages
from before is: