-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Florence2 support #31506
Add Florence2 support #31506
Conversation
Command to check diff from original impl FILE=configuration_florence2.py
git diff --no-index <(curl -s https://huggingface.co/microsoft/Florence-2-large/raw/main/$FILE) src/transformers/models/florence/$FILE |
@D4ve-R Let us know when it's ready for review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @amyeroberts i'm almost done. Have to finish writing the test cases.
Can you take a look at the code in src/transformers/models/florence
?
I'm stuck in one place, see comment in this review, maybe you have some tips or suggestions how to fix them. Thank you!
Edit: I resolved all my questions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comment since I'm very interested in this PR! Overall it seems like going in a right path :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we can add some #Copied from logic for the definition and class also!
e.g.
# Copied from transformers.models.deformable_detr.modeling_deformable_detr.MultiScaleDeformableAttentionFunction |
I linked groundingdino since it is a similar task
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure what you mean. Could you point out where you want to include this in the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some definition or class function can be directly copied from original repo. For example
# Copied from transformers.models.beit.modeling_beit.BeitDropPath with Beit->GroundingDino |
We can use DropPath as same in this file and use
#Copied from logic
This Copied from logic is for the
make fix-copies
reuse the definition for existing method.
BTW all the function is named independent to florence2. I'm not sure this is okay? (e.g. DropPath -> Florence2DropPath)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use DropPath as same in this file and use #Copied from logic
This Copied from logic is for the make fix-copies reuse the definition for existing method.
Ok now i know what you mean. Thank you for the tip! Changed it for now, but looking at the code in timm, it's slightly different, so i don't know if this is the right thing to do. This might introduce/persist already fixed bugs. https://github.com/huggingface/pytorch-image-models/blob/main/timm/layers/drop.py#L150
BTW all the function is named independent to florence2. I'm not sure this is okay? (e.g. DropPath -> Florence2DropPath)
Reading a bit more in transformers library, I think the naming conventions only apply for classes that are meant to be imported and/or when they are copied. Imo prefixing everything with Florence2
is not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok now i know what you mean. Thank you for the tip! Changed it for now, but looking at the code in timm, it's slightly different, so i don't know if this is the right thing to do. This might introduce/persist already fixed bugs. https://github.com/huggingface/pytorch-image-models/blob/main/timm/layers/drop.py#L150
The best way is probably to use is_timm_available
and import DropPath from timm, same as in the original implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW all the function is named independent to florence2. I'm not sure this is okay? (e.g. DropPath -> Florence2DropPath)
Maybe @amyeroberts or some other maintainers can give us some hints how to handle this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was also explaining one of the patterns that repo should keep on it! You do not always have to use this #Copied method if it is not applicable. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things to unpack here :)
I think the naming conventions only apply for classes that are meant to be imported and/or when they are copied. Imo prefixing everything with Florence2 is not necessary.
This isn't quite right.
There are certain objects which we import and don't add a prefix or reimplement. If we're implementing any custom class/layer/submodule within the modeling file, then it should be prefixed with the model's name. For example in CLIP, BaseModelOutputWithPooling is used directly as it's a common class. Whereas there are also model-specific outputs which are defined and prefixed with CLIP in their name.
In the case of drop path, we want to define our own class, with the Florence2
prefix instead of importing from timm (we should try to avoid dependencies form other libraries as much as possible).
If the functionality is the same, we should copy from another model like Beit. If it's different we should implement from scratch in this modeling file.
With regards to the # Copied from
pattern, this is the classical way to utilise modules from other parts of the library.
We've recently introduced a new, improved way of adding new models which are very similar to other models in the library using the "diff converter", which replaces # Copied from
.
This involves defining a diff_model_name.py
file, which defines modules which are copied, overridden and newly implemented. The modeling file is then automatically generated.
- PR introducing this, with examples: Diff converter v2 #30868
- Example PR using this: Add LLaVa NeXT Video #31252
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed feedback, this was really helpful!
I'll look into the "diff converter" method. Appreciate the guidance!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's pretty new, so let us know if it's unclear or there's any unexpected behaviour!
@D4ve-R I share some script that finetuning florence-2. This might not be include in this PR since this is unofficial but share for your intereset. |
@D4ve-R Thanks for all your work on this so far! I think we can close this PR. The Florence 2 model is available directly on the hub to be used within transformers e.g.: https://huggingface.co/microsoft/Florence-2-large Apologies for not noticing this sooner |
@amyeroberts no worries! It was really fun and a great experience to learn something. |
@D4ve-R Hi, even though the PR might be closed, I would recommend you to keep this branch alive in your repo. I think this might be valuable when we also consider finetuning florence2 model! Nice work :) |
What does this PR do?
Add support for microsoft/Florence2
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.