-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: adding mplugdocowl #31059
base: main
Are you sure you want to change the base?
feat: adding mplugdocowl #31059
Conversation
self.vocab_size = config.text_config.vocab_size | ||
#initialize LlamaAttention | ||
#replace_llama_modality_adaptive() | ||
transformers.models.llama.modeling_llama.LlamaAttention = MultiwayAttention |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure whether this changes the model architecture, because it still doesn't have the modules that I need. Like Multiway Attention or new Decoder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, was seeing this PR so might help here :)
Usually we avoid using inheritance or classes from different models. Each model is typically implemented in a single modeling_xxx.py script which is self-contained and does not depend on any other modeling files.
This means that we'll need to define a MPlugDowOwlMultiwayAttention
class, a MPlugDowOwlTextDecoderLayer
class, and so on.
One can leverage #Copied from
statements above a class or method in case a class or method is the same as another one, e.g. MistralAttention
copies from LlamaAttention
since they're the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello Niels! @molbap provided a similar feedback:) Yes, I just created a separate file for the language model and defined separate classes. This indeed helps! Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! No worries :)
…ngs in VisionModel
src/transformers/models/mplugdocowl/convert_mplugdocowl_weights_to_hf.py
Show resolved
Hide resolved
src/transformers/models/mplugdocowl/image_processing_mplugdocowl.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice progress!
src/transformers/models/mplugdocowl/image_processing_mplugdocowl.py
Outdated
Show resolved
Hide resolved
src/transformers/models/mplugdocowl/image_processing_mplugdocowl.py
Outdated
Show resolved
Hide resolved
src/transformers/models/mplugdocowl/image_processing_mplugdocowl.py
Outdated
Show resolved
Hide resolved
…wl.py Co-authored-by: Pablo Montalvo <[email protected]>
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…classname to inits
…sue with default_to_square
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comment, should be named with ...modeling... somewhere for glance value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments on RoPE, will add another review later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need a checkout here to clear diff
src/transformers/models/mplugdocowl/language_modeling_mplugdocowl.py
Outdated
Show resolved
Hide resolved
class MPLUGDocOwlRotaryEmbedding(nn.Module): | ||
def __init__(self, dim, max_position_embeddings=2048, base=10000, device=None, scaling_factor=1.0): | ||
super().__init__() | ||
self.scaling_factor = scaling_factor | ||
self.dim = dim | ||
self.max_position_embeddings = max_position_embeddings | ||
self.base = base | ||
inv_freq = 1.0 / (self.base ** (torch.arange(0, self.dim, 2, dtype=torch.int64).float().to(device) / self.dim)) | ||
self.register_buffer("inv_freq", inv_freq, persistent=False) | ||
# For BC we register cos and sin cached | ||
self.max_seq_len_cached = max_position_embeddings | ||
t = torch.arange(self.max_seq_len_cached, device=device, dtype=torch.int64).type_as(self.inv_freq) | ||
t = t / self.scaling_factor | ||
freqs = torch.outer(t, self.inv_freq) | ||
# Different from paper, but it uses a different permutation in order to obtain the same calculation | ||
emb = torch.cat((freqs, freqs), dim=-1) | ||
self.register_buffer("_cos_cached", emb.cos().to(torch.get_default_dtype()), persistent=False) | ||
self.register_buffer("_sin_cached", emb.sin().to(torch.get_default_dtype()), persistent=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be an older implementation of RoPE. The class name seems to say this is basic RoPE but it applies linear scaling
t = torch.arange(self.max_seq_len_cached, device=device, dtype=torch.int64).type_as(self.inv_freq)
t = t / self.scaling_factor
Which is now done on the position_ids
, and in the correct class. Better use the more recent one of llama codebase since this is llama-based
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And from Omni configuration - looks like default RoPE scaling is None
, which in this case would instantiate this class, that has a linear scaling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sure this is not impacting generation, it could change a few things if positional embs are messed up
The mPLUGDocOwl model was proposed in [<INSERT PAPER NAME HERE>](<INSERT PAPER LINK HERE>) by <INSERT AUTHORS HERE>. | ||
<INSERT SHORT SUMMARY HERE> | ||
|
||
The abstract from the paper is the following: | ||
|
||
*<INSERT PAPER ABSTRACT HERE>* | ||
|
||
Tips: | ||
|
||
<INSERT TIPS ABOUT MODEL HERE> | ||
|
||
This model was contributed by [INSERT YOUR HF USERNAME HERE](https://huggingface.co/<INSERT YOUR HF USERNAME HERE>). | ||
The original code can be found [here](<INSERT LINK TO GITHUB REPO HERE>). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Todo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still todo, add the paper authors, abstract, tips, your contributor hf handle, original gh repo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
almost done, tips need to be added
src/transformers/models/mplugdocowl/language_modeling_mplugdocowl.py
Outdated
Show resolved
Hide resolved
…owl.py fix: removed cos, sin cached Co-authored-by: Pablo Montalvo <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments weren't published as I had poor connectivity... most are outdated but submitting in case something slips by
"past_key_values": past_key_values, | ||
"use_cache": kwargs.get("use_cache"), | ||
"attention_mask": attention_mask, | ||
"pixel_values": pixel_values, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a bit confused here why pixel_values
are passed? Images are probably not needed - inputs_embeds are created from images and text, and are then passed to the model forward() call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to keep them because to get the image_features, pixel_values need to be not None. Same thing is happening in llava code.
src/transformers/models/mplugdocowl/language_modeling_mplugdocowl.py
Outdated
Show resolved
Hide resolved
|
||
def _init_rope(self): | ||
if self.config.rope_scaling is None: | ||
self.rotary_emb = MPLUGDocOwlRotaryEmbedding( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here IIUC this is not the correct RoPE class, it adds scaling
def _get_unpad_data(attention_mask): | ||
seqlens_in_batch = attention_mask.sum(dim=-1, dtype=torch.int32) | ||
indices = torch.nonzero(attention_mask.flatten(), as_tuple=False).flatten() | ||
max_seqlen_in_batch = seqlens_in_batch.max().item() | ||
cu_seqlens = F.pad(torch.cumsum(seqlens_in_batch, dim=0, dtype=torch.int32), (1, 0)) | ||
return ( | ||
indices, | ||
cu_seqlens, | ||
max_seqlen_in_batch, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only useful for FA2
src/transformers/models/mplugdocowl/modelling_vision_mplugdocowl.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Pablo Montalvo <[email protected]>
Co-authored-by: Pablo Montalvo <[email protected]>
Co-authored-by: Pablo Montalvo <[email protected]>
Co-authored-by: Pablo Montalvo <[email protected]>
Co-authored-by: Pablo Montalvo <[email protected]>
Co-authored-by: Pablo Montalvo <[email protected]>
Co-authored-by: Pablo Montalvo <[email protected]>
Co-authored-by: Pablo Montalvo <[email protected]>
Co-authored-by: Pablo Montalvo <[email protected]>
…wl.py Co-authored-by: Pablo Montalvo <[email protected]>
…s_to_hf.py Co-authored-by: Pablo Montalvo <[email protected]>
…owl.py Co-authored-by: Pablo Montalvo <[email protected]>
What does this PR do?
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.