-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CogVLM (cleaner) #28196
Add CogVLM (cleaner) #28196
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the model!
The model logic is a bit unclear to me in some parts, left a few comments. Maybe we can add more tests to make it clear
return output | ||
|
||
|
||
class CogvlmRotaryEmbedding(torch.nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copied from mistral
# and we need to add 1 to take into account the one extra token that is going to | ||
# be sent through the model | ||
position_ids = build_position_ids(token_type_ids, attention_mask) + 2 + 1 | ||
position_ids = position_ids[:, -1:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice if we make this dependent on input's seq length, instead of assuming 1. For cases of speculative decoding, when it starts working for VLMs
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hi @amyeroberts would be great if you could review this PR, quite some updates were made, making it ready for review. |
Let's have a first round of reviews from the people already familiar with the model and PR :) cc @zucchini-nlp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on it, left a few comments. I guess this supports the new CogVLM also?
# import xformers.ops as xops | ||
|
||
# out = xops.memory_efficient_attention( | ||
# queries, | ||
# keys, | ||
# values, | ||
# scale=self.scale, | ||
# ) | ||
|
||
# output = self.dense(out.view(batch_size, sequence_length, -1)) | ||
# output = self.output_dropout(output) | ||
# return output | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments to be removed
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
What does this PR do?
This PR adds CogVLM, in a cleaner way. Follow-up of #27718.
Debugging logits (for branch with matching logits see
add_cogvlm_cleaner_with_matching_logits
):self.weight * hidden_states.to(input_dtype)
instead of(self.weight * hidden_states).to(input_dtype)
.EagerAttention
besidesSdpaAttention
(only latter gives matching logits)