Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CogVLM (cleaner) #28196

Closed
wants to merge 129 commits into from
Closed

Conversation

NielsRogge
Copy link
Contributor

@NielsRogge NielsRogge commented Dec 22, 2023

What does this PR do?

This PR adds CogVLM, in a cleaner way. Follow-up of #27718.

Debugging logits (for branch with matching logits see add_cogvlm_cleaner_with_matching_logits):

  • image features from CLIP backbone match in case xops.memory_efficient_attention is used. So this needs to be replaced by native PyTorch =>
  • debug LLM forward -> had to do with RMSNorm being copied from llama, which uses self.weight * hidden_states.to(input_dtype) instead of (self.weight * hidden_states).to(input_dtype).
  • support EagerAttention besides SdpaAttention (only latter gives matching logits)
  • support key/value caching
  • Make sure logits match with 1e-4.

@NielsRogge NielsRogge requested a review from amyeroberts April 29, 2024 07:55
Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the model!

The model logic is a bit unclear to me in some parts, left a few comments. Maybe we can add more tests to make it clear

docs/source/en/perf_infer_gpu_one.md Outdated Show resolved Hide resolved
return output


class CogvlmRotaryEmbedding(torch.nn.Module):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copied from mistral

src/transformers/models/cogvlm/modeling_cogvlm.py Outdated Show resolved Hide resolved
src/transformers/models/cogvlm/modeling_cogvlm.py Outdated Show resolved Hide resolved
# and we need to add 1 to take into account the one extra token that is going to
# be sent through the model
position_ids = build_position_ids(token_type_ids, attention_mask) + 2 + 1
position_ids = position_ids[:, -1:]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice if we make this dependent on input's seq length, instead of assuming 1. For cases of speculative decoding, when it starts working for VLMs

src/transformers/models/cogvlm/modeling_cogvlm.py Outdated Show resolved Hide resolved
src/transformers/models/cogvlm/modeling_cogvlm.py Outdated Show resolved Hide resolved
src/transformers/models/cogvlm/test_pipeline.py Outdated Show resolved Hide resolved
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@NielsRogge
Copy link
Contributor Author

Hi @amyeroberts would be great if you could review this PR, quite some updates were made, making it ready for review.

@amyeroberts
Copy link
Collaborator

Let's have a first round of reviews from the people already familiar with the model and PR :) cc @zucchini-nlp

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on it, left a few comments. I guess this supports the new CogVLM also?

src/transformers/models/cogvlm/configuration_cogvlm.py Outdated Show resolved Hide resolved
Comment on lines +151 to +163
# import xformers.ops as xops

# out = xops.memory_efficient_attention(
# queries,
# keys,
# values,
# scale=self.scale,
# )

# output = self.dense(out.view(batch_size, sequence_length, -1))
# output = self.output_dropout(output)
# return output

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments to be removed

src/transformers/models/cogvlm/modeling_cogvlm.py Outdated Show resolved Hide resolved
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants