-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
uniformize kwargs for SAM #34578
uniformize kwargs for SAM #34578
Conversation
5c2efda
to
197e079
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate this PR since I am working on SAM2. It seems nice :)
*args, # to be deprecated | ||
text: Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]] = None, | ||
audio=None, | ||
video=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also define the input time in audio
and video
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what you mean here, could you please be more specific?
The signature of __call__
is given, it should not be extended as far as I understand. However, it can be added as part of VideosKwargs
and/or AudioKwargs
. Can you point me where this attribute/variable is defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*time -> *type, I think it is okay since this does not requires audio and video
@@ -33,6 +34,23 @@ | |||
import tensorflow as tf | |||
|
|||
|
|||
class SamImagesKwargs(ImagesKwargs): | |||
segmentation_maps: Optional[ImageInput] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it this segmentation_maps
is not required for the inference of SAM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed for backwards compatibility - even if it is not used. E.g. take the following call: processor(images, None, input_points)
. If I would remove it from _add_args_for_backward_compatibility
, such calls would break.
If I understood correctly, #31911 is about uniform kwargs, and then at a later step the API would be cleaned up further. In SAM's case, _add_args_for_backward_compatibility
will be removed, and then segmentation_maps
can be removed as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually have an existing way to handle such positional args, you can take a look at udop processor for example :)
input_boxes=None, | ||
return_tensors: Optional[Union[str, TensorType]] = None, | ||
images: ImageInput = None, | ||
*args, # to be deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there reason for the existance of this *args?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, see comment above. Purely for bc and should be removed at a later stage.
cc @yonigozlan can you take a look! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for working on this!
Looks great to me overall, only thing is to use the existing methods to handle positional args for backward compatibility.
On a separate note, and I know this precedes this PR, but it looks to me that there is only an image processor in this processor? Why was all this logic created as a processor at all and not just added to an image processor?
If anyone has any explanation for that, I’d really appreciate it. :)
segmentation_maps=None, | ||
input_points=None, | ||
input_labels=None, | ||
input_boxes=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be added to the optional_call_args
attribute (see udop processor)
@staticmethod | ||
def _add_args_for_backward_compatibility(args): | ||
""" | ||
Remove this function once support for args is dropped in __call__ | ||
""" | ||
if len(args) > 4: | ||
raise ValueError("Too many positional arguments") | ||
return dict(zip(("segmentation_maps", "input_points", "input_labels", "input_boxes"), args)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't be needed as we now have prepare_and_validate_optional_call_args
input_boxes=None, | ||
return_tensors: Optional[Union[str, TensorType]] = None, | ||
images: Optional[ImageInput] = None, | ||
*args, # to be deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*args, # to be deprecated | |
# The following is to capture `segmentation_maps`, `input_points`, `input_labels` and `input_boxes` arguments that may be passed as a positional argument. | |
# See transformers.processing_utils.ProcessorMixin.prepare_and_validate_optional_call_args for more details, | |
# or this conversation for more context: https://github.com/huggingface/transformers/pull/32544#discussion_r1720208116 | |
# This behavior is only needed for backward compatibility and will be removed in future versions. | |
# | |
*args, |
SamProcessorKwargs, | ||
tokenizer_init_kwargs={}, | ||
**kwargs, | ||
**self._add_args_for_backward_compatibility(args), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**self._add_args_for_backward_compatibility(args), | |
**self.prepare_and_validate_optional_call_args(*args), |
@yonigozlan Thanks for the review, I have implemented your suggestions. Could you check again please? |
Friendly reminder @yonigozlan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks for iterating!
Last step is to add the ProcessorTesterMixin
to SamProcessorTest
in test_processor_sam.py
, and make sure all the tests pass. Since the processor really only process images, there will probably be quite a few tests to override.
And also you'' have to rebase on main!
67d3d6b
to
3dc8b92
Compare
3dc8b92
to
d92cd4b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left two small comments and after these modifications LGTM! Thanks for working on this
@@ -152,7 +164,9 @@ def test_post_process_masks(self): | |||
|
|||
@require_vision | |||
@require_tf | |||
class TFSamProcessorTest(unittest.TestCase): | |||
class TFSamProcessorTest(ProcessorTesterMixin, unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say you only need ProcessorTesterMixin
for SamProcessorTest
, so you can remove this here and the skipped tests as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. However, I had to then add back the prepare_image_inputs
methods. With this, I updated the comment.
@@ -235,7 +259,9 @@ def test_post_process_masks(self): | |||
|
|||
@require_vision | |||
@require_torchvision | |||
class SamProcessorEquivalenceTest(unittest.TestCase): | |||
class SamProcessorEquivalenceTest(ProcessorTesterMixin, unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
Pinging @ArthurZucker for a core maintainer review |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
a7c143d
to
6f24d6c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yonigozlan for the reviews! 🤗
Adds uniformized processors for SAM following #31911.
@qubvel @molbap