Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uniformize kwargs for SAM #34578

Merged
merged 7 commits into from
Dec 23, 2024
Merged

Conversation

tibor-reiss
Copy link
Contributor

Adds uniformized processors for SAM following #31911.

@qubvel @molbap

Copy link
Contributor

@SangbumChoi SangbumChoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate this PR since I am working on SAM2. It seems nice :)

*args, # to be deprecated
text: Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]] = None,
audio=None,
video=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also define the input time in audio and video?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you mean here, could you please be more specific?

The signature of __call__ is given, it should not be extended as far as I understand. However, it can be added as part of VideosKwargs and/or AudioKwargs. Can you point me where this attribute/variable is defined?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*time -> *type, I think it is okay since this does not requires audio and video

@@ -33,6 +34,23 @@
import tensorflow as tf


class SamImagesKwargs(ImagesKwargs):
segmentation_maps: Optional[ImageInput]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it this segmentation_maps is not required for the inference of SAM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed for backwards compatibility - even if it is not used. E.g. take the following call: processor(images, None, input_points). If I would remove it from _add_args_for_backward_compatibility, such calls would break.
If I understood correctly, #31911 is about uniform kwargs, and then at a later step the API would be cleaned up further. In SAM's case, _add_args_for_backward_compatibility will be removed, and then segmentation_maps can be removed as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually have an existing way to handle such positional args, you can take a look at udop processor for example :)

input_boxes=None,
return_tensors: Optional[Union[str, TensorType]] = None,
images: ImageInput = None,
*args, # to be deprecated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there reason for the existance of this *args?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, see comment above. Purely for bc and should be removed at a later stage.

@molbap molbap self-requested a review November 5, 2024 12:49
@ArthurZucker
Copy link
Collaborator

cc @yonigozlan can you take a look!

Copy link
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working on this!
Looks great to me overall, only thing is to use the existing methods to handle positional args for backward compatibility.

On a separate note, and I know this precedes this PR, but it looks to me that there is only an image processor in this processor? Why was all this logic created as a processor at all and not just added to an image processor?

If anyone has any explanation for that, I’d really appreciate it. :)

Comment on lines -61 to -64
segmentation_maps=None,
input_points=None,
input_labels=None,
input_boxes=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be added to the optional_call_args attribute (see udop processor)

Comment on lines 74 to 82
@staticmethod
def _add_args_for_backward_compatibility(args):
"""
Remove this function once support for args is dropped in __call__
"""
if len(args) > 4:
raise ValueError("Too many positional arguments")
return dict(zip(("segmentation_maps", "input_points", "input_labels", "input_boxes"), args))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't be needed as we now have prepare_and_validate_optional_call_args

input_boxes=None,
return_tensors: Optional[Union[str, TensorType]] = None,
images: Optional[ImageInput] = None,
*args, # to be deprecated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*args, # to be deprecated
# The following is to capture `segmentation_maps`, `input_points`, `input_labels` and `input_boxes` arguments that may be passed as a positional argument.
# See transformers.processing_utils.ProcessorMixin.prepare_and_validate_optional_call_args for more details,
# or this conversation for more context: https://github.com/huggingface/transformers/pull/32544#discussion_r1720208116
# This behavior is only needed for backward compatibility and will be removed in future versions.
#
*args,

SamProcessorKwargs,
tokenizer_init_kwargs={},
**kwargs,
**self._add_args_for_backward_compatibility(args),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**self._add_args_for_backward_compatibility(args),
**self.prepare_and_validate_optional_call_args(*args),

@tibor-reiss
Copy link
Contributor Author

@yonigozlan Thanks for the review, I have implemented your suggestions. Could you check again please?

@tibor-reiss
Copy link
Contributor Author

Friendly reminder @yonigozlan

Copy link
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks for iterating!
Last step is to add the ProcessorTesterMixin to SamProcessorTest in test_processor_sam.py, and make sure all the tests pass. Since the processor really only process images, there will probably be quite a few tests to override.
And also you'' have to rebase on main!

Copy link
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left two small comments and after these modifications LGTM! Thanks for working on this

@@ -152,7 +164,9 @@ def test_post_process_masks(self):

@require_vision
@require_tf
class TFSamProcessorTest(unittest.TestCase):
class TFSamProcessorTest(ProcessorTesterMixin, unittest.TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say you only need ProcessorTesterMixin for SamProcessorTest, so you can remove this here and the skipped tests as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. However, I had to then add back the prepare_image_inputs methods. With this, I updated the comment.

@@ -235,7 +259,9 @@ def test_post_process_masks(self):

@require_vision
@require_torchvision
class SamProcessorEquivalenceTest(unittest.TestCase):
class SamProcessorEquivalenceTest(ProcessorTesterMixin, unittest.TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@yonigozlan
Copy link
Member

Pinging @ArthurZucker for a core maintainer review

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yonigozlan for the reviews! 🤗

@ArthurZucker ArthurZucker merged commit e10be82 into huggingface:main Dec 23, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants