-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto model & pipeline for image-text-to-image-text models #32926
Comments
FYI @NielsRogge and @merveenoyan , you've been discussing recently tags for these kinds of models on the hub |
@GargDivanshu You might also wanna take a look at #32013 You can start by adding some of the missing tests and such to gain familiarity with the code there. And once you're ready, I can help you implement multimodal in-and-out for the other models. |
perfect, moving to #32013 |
@zucchini-nlp I think this falls under |
Yes, I agree it should be any-to-any. Was just adding you in the loop since some contributors are working on adding these types of models :) |
Feature request
This is a tracker issue for work on interleaved in-and-out image-text generation.
There are now >= 5 open-source models that can do interleaved image-text generation--and many more are expected to be released. Thus, it would now be practical & useful for us to (1) add native support for such models and (2) standardize the logic flow of data through processors and pipelines as done in #31911 and #32472
Initial work for Chameleon & Anole can be found here: #32013 for reference.
Notes:
TODOs:
Motivation
Your contribution
I've already started work on Chameleon & Anole here: #32013
But I'm currently blocked by (1) not having enough time due to other responsibilities and (2) not having enough compute resources.
Any help would be appreciated!
The text was updated successfully, but these errors were encountered: