-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SFT VLM] Add support for Molmo models #2136
Comments
I'd like to contribute to it if you give me some guidance about the requirements! 😄 |
Great! I would start by looking at the inference code from one of the models (example) and seeing how the inputs need to be provided to the model. Once you've understood that, it should be reasonably straightforward to extend the training script to include these models with @edbeeching can also provide some guidance as he made the original implementation :) |
Hi @sergiopaniego, I had a look at the modelling code of Molmo and the precessor is not quite the same as llama-vision and llava. So you may find it challenging to have a script that works for all these models. If you would like to make a standalone script that works just for Molmo, adapted from our sft_vlm script, that would be a great first step, we can then iterate together to see if we can generalize the scripts. |
It might also be good to track the |
Thanks a lot for the details! I'm currently running the script as it is while trying to understand the differences compared to Molmo. |
Yes that is the one. |
Thanks for the reaffirmation, @edbeeching! I've created a reproducible example on Google Colab to share the code: Currently, I'm encountering a Some details:
I’m actively investigating the issue. Do you have any suggestions on how to resolve it? |
I could load the model without the error by first importing
|
Could you share your reproducible example? |
Sure, I've added those changes to your colab here and the rest should be the same. |
Hello, do you have a timeline for this? |
I attempted to extend the notebook, but I encountered the same exception. I’m continuing to investigate the root cause. |
try this colab: https://colab.research.google.com/drive/1RICZvuxLJ0g6dCIkOIf0HC5J9fJGqNTU?usp=sharing |
Hi, let me know if you would like me to take a look? |
Hi @edbeeching! Sorry for the delay. I was busy last week, but I have some additional time to dedicate this week. I've reproduced @smellslikeml's idea (https://colab.research.google.com/drive/1doT9u811J-WNCnsT6-rP9-OxnDv52M6W?usp=sharing), and I'll try to generate the PR this week. Should we wait until huggingface/transformers#33962 is completed? |
Feature request
Extend the
sft_vlm.py
script to support the new Molmo models from AllenAI: https://huggingface.co/collections/allenai/molmo-66f379e6fe3b8ef090a8ca19Paper: https://arxiv.org/abs/2409.17146
Motivation
The Molmo models are super strong VLMs across all model scales, in some cases matching or exceeding the performance of GPT-4V:
Having the ability to tune these models on custom datasets would be quite exciting for many vision-language applications (e.g. agents)
Your contribution
Open to the community!
The text was updated successfully, but these errors were encountered: