Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wirthual/fix vision #12

Open
wants to merge 4 commits into
base: xsn/vision
Choose a base branch
from

Conversation

wirthual
Copy link

@wirthual wirthual commented Oct 2, 2024

@wirthual
Copy link
Author

wirthual commented Oct 10, 2024

Hi @ngxson ,

I saw you started working on the support for mllama. In this branch I added the new tensor names from Llama 3.2. I came accross 2 problems at this point:

add_tensor_info in gguf writer raises a Duplicated tensor name error for blk.{bid}.ffn_up. Is this layer doubled for vision and language parts and I need to prefix the name somewhere?

Another problem is the tokenizer loaded seems to have 1 token more than the specified in the vocab_size parameter so it errors out here:

        vocab_size = self.hparams["text_config"].get("vocab_size", len(tokenizer.vocab))
        assert max(tokenizer.vocab.values()) < vocab_size

Is the additonal token related to the image so this check needs to be changed?

Skipping those two check, I am able to produce a GGUF file.

Best,
wirthual

@ngxson
Copy link
Owner

ngxson commented Oct 10, 2024

Thanks for your suggestion. However, my PR ggerganov#9687 targets llava (please read the descriptions for more info). This is to reduce the complexity and focus on developing a framework that can support multiple archs in the future.

llama-3.2 vision will be added at some point, so I'll keep this PR open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants