Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LLaVA support, modify generate function #820

Open
wants to merge 3 commits into
base: dev
Choose a base branch
from

Conversation

zazamrykh
Copy link

This is continuation of #818 topic. Now make PR to dev branch.
Also I've decided that it would be good idea for generate function work with any type of input type and return type. I decided that it would be more readable and simply if generate function will generate sequence using embeddings as input to forward function. It is because string and tokens can be casted to embeddings but embeddings cannot be casted backward to tokens or string. And I made possible to give any input (string, list of strings, tokens, embeddings) ang get any output of function. Maybe there is something that I didn't make correct. For example, I'm not sure if my implementation is okay with positional embeddings.

I've met problem, LLaVA processor = AutoProcessor.from_pretrained(model_id, revision="a272c74") do not have apply_chat_template method at transformer version of project. So it is needed to update transformer library for this method work.

I've made test for any type of input and return types with gpt-2 model and it passed. But outputs of gpt-2 looks very poor though maybe for gpt-2 it is normal
image
I've also rerun all LLaVA.ipynb file and even tested other types of output of generate function and all worked pretty well.

  • New feature (non-breaking change which adds functionality)

  • I have commented my code, particularly in hard-to-understand areas

  • I have made corresponding changes to the documentation

  • My changes generate no new warnings

  • I have added tests that prove my fix is effective or that my feature works

@jojje
Copy link

jojje commented Dec 27, 2024

@zazamrykh I saw the initial struggles with the branch selection (dev-3) and appreciate the effort you're making to get this PR in shape for dev. Will be incredibly useful to have transformerlens work also with llava (vision) models. Just wanted to show some early appreciation for the effort in the hope you'll stick it out and not abandon the effort midway due to "pulling your hair" in frustration, while not knowing if anyone else really cares. I care :)

@zazamrykh
Copy link
Author

@zazamrykh I saw the initial struggles with the branch selection (dev-3) and appreciate the effort you're making to get this PR in shape for dev. Will be incredibly useful to have transformerlens work also with llava (vision) models. Just wanted to show some early appreciation for the effort in the hope you'll stick it out and not abandon the effort midway due to "pulling your hair" in frustration, while not knowing if anyone else really cares. I care :)

I am very glad to hear such a words of supporting) Changes in one function invoke changes in others so I've changed some functions which initially I haven't thought change. Then I've broke generate function, then, as I've thought, I've fixed it, but now some testes are not passing! I'll try to clarify what is the reason of failing tests. If you want to investigate VLM now, I think you can get last commit of feature-llava-support branch from forked repository, but I hope llava support appears soon in transformerlens!

@bryce13950
Copy link
Collaborator

@zazamrykh I merged dev into your PR earlier today, and took a look at the changes. This is a pretty big change, so it may require a bit of testing. As for the current test failure, it very while might have something to do with a float being rounded down at some point. There are quite a few places where ints are now accepted as params in this change, so if you need some ideas of where to start looking, that may be a good place.

Also, I really like splitting out the insides of the generate function. Just make sure to add proper doc strings before this is merged, so that those functions are documented properly in the public docs.

If you need help wrapping anything up, don't hesitate to ask. I took a quick glance just to get an idea of the size of the change, but I have not done a detailed look at the changes. I normally wait until the tests are passing to do that, but I am happy to help resolve any outstanding tasks if need be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants