Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the llama adapter v2 code not available yet? #126

Open
yeonju7kim opened this issue Nov 7, 2023 · 2 comments
Open

Is the llama adapter v2 code not available yet? #126

yeonju7kim opened this issue Nov 7, 2023 · 2 comments

Comments

@yeonju7kim
Copy link

Thanks for your wonderful work. I've been recently working on this repository and learned a lot so far.

I have two questions about the code.

  1. Is the llama adapter v2 available now?
    https://github.com/OpenGVLab/LLaMA-Adapter/blob/a50befee3fdde8a08ca346b2ec70407e59ff6536/llama_adapter_v2_multimodal7b/llama/llama_adapter.py#L152C8-L172C46

The llama adapter v2 code is different from the paper.
The code doesn't do early fusion in the forward function.

The below code has to be as follows.
https://github.com/OpenGVLab/LLaMA-Adapter/blob/a50befee3fdde8a08ca346b2ec70407e59ff6536/llama_adapter_v2_multimodal7b/llama/llama_adapter.py#L163C9-L164C45

# Before change
# for layer in self.llama.layers[:-1 * self.query_layer]:
#             h = layer(h, 0, freqs_cis, mask)
for layer in self.llama.layers[:-1 * self.query_layer]:
            h = layer(h, 0, freqs_cis, mask, visual_query)
  1. In the process of extracting a visual prompt, I assumed that the visual encoder directly extracts it. However, I observed that the first visual embedding was attended through a self-attention-like module, and then only the first 10 elements from the attended visual embedding are used as a visual prompt.
    Could you please explain the reason for this approach?

https://github.com/OpenGVLab/LLaMA-Adapter/blob/a50befee3fdde8a08ca346b2ec70407e59ff6536/llama_adapter_v2_multimodal7b/llama/llama_adapter.py#L135C1-L149C28

@icrto
Copy link

icrto commented Feb 6, 2024

I was also wondering the same things. Could you please explain?

@waybarrios
Copy link

waybarrios commented Apr 5, 2024

The first 10 elements supposed to be the learnable queries. But you are right, the code seems imcompleted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants