uniformize kwargs for OneFormer #34547

tibor-reiss · 2024-10-31T21:02:46Z

Adds uniformized processors for OneFormer following #31911.

Small changes to simplify code.

Additional check via test_processor_oneformer.py that the inputs are the same before and after the changes (with fixed random seeds).

@qubvel @molbap

molbap

Thanks for the initiative! Left a couple initial comments

molbap · 2024-11-01T07:40:26Z

tests/models/oneformer/test_processor_oneformer.py

Cool! But we should also check that the previous signature works as intended as to not break backwards compatibility - in this case the previous behaviour where arguments were passed positionally should still be supported

I added *args. I had to add it after images, so the signature will only match after another iteration of deprecation. WDYT?

molbap · 2024-11-01T07:42:55Z

src/transformers/models/oneformer/processing_oneformer.py

+        if isinstance(task_inputs, str):
+            task_inputs = [task_inputs]
+
+        if not isinstance(task_inputs, List) or not task_inputs:
+            raise TypeError("task_inputs should be a string or a list of strings.")


since this does more than checking the types, I suggest moving the conversion to list out of this function, and rename it explicitly _validate_input_types for instance.

Implemented.

src/transformers/models/altclip/processing_altclip.py

molbap · 2024-11-01T07:49:59Z

src/transformers/models/oneformer/processing_oneformer.py


+    def _preprocess_text(self, text_list: PreTokenizedInput, max_length: int = 77):


so this tokenizes which should be done with self.tokenizer(..., output_kwargs['text_kwargs']), and it seems that it puts pad tokens where the attention mask is 0 - this should be covered by the tokenizer already, would be a nice refactor

I will look into this...

Atm I don't see how this can be simplified because the function is called with different arguments inside __call__, and it is also called in encode_inputs. Any suggestions?

Agreed that it would be best to use self.tokenizer directly, and pass in the other text kwargs as well. For max_length, you can do the following:

if the max_length kwarg is not set, set it to max_seq_length or task_seq_length in the two different calls.

if it is set, use its value for both task and seq.

I tried this (commit 3c7724d), but I don't see yet how it is simpler/more readable :)
Couple of observations:

the function _preprocess_text is called both with task_seq_length and max_seq_length, so if I add "max_length" to the text_kwargs dict, I would need to add and then replace in __call__.

in order to do something similar to what the attention-mask is doing (padding with 0s), I would need to change self.tokenizer.pad_token_id (and then change it back to what was before, e.g. 49407 (endoftext), because I can't pass in pad_token_id to the tokenizer. I am most probably missing here something, so suggestions welcome.

molbap · 2024-11-01T07:51:41Z

src/transformers/models/oneformer/processing_oneformer.py

+        max_seq_length: int = 77,
+        task_seq_length: int = 77,


so the way it should be is that in OneFormerProcessorKwargs, these defaults should be passed to the relevant dictionary, which you can then get back. say

_defaults = {"max_seq_length" = 77, "task_seq_length" = 77}

and you can inform the types with a Kwargs typed class

class OneFormerTextKwargs(TextKwargs, total=False): max_seq_length: Optional[int] task_seq_length: Optional[int]

then your types are correctly informed as well as your defaults and you can use both.

This was a bigger refactor. Let me know what you think...

ArthurZucker · 2024-11-19T11:38:51Z

Hey! @molbap is off as he needs to rest, cc @yonigozlan can you review this? 🤗

yonigozlan

Thanks for working on this!
I mainly have comments about the handling of positional kwargs (like in the SAM PR), and about avoiding the wrapping of text processing inside _preprocess_text

yonigozlan · 2024-11-27T16:22:31Z

src/transformers/models/oneformer/processing_oneformer.py


+    def _preprocess_text(self, text_list: PreTokenizedInput, max_length: int = 77):


Agreed that it would be best to use self.tokenizer directly, and pass in the other text kwargs as well. For max_length, you can do the following:

if the max_length kwarg is not set, set it to max_seq_length or task_seq_length in the two different calls.

if it is set, use its value for both task and seq.

yonigozlan · 2024-11-27T16:27:42Z

src/transformers/models/oneformer/processing_oneformer.py

+    def encode_inputs(
+        self,
+        images=None,
+        task_inputs=None,
+        segmentation_maps=None,
+        max_seq_length: int = 77,
+        task_seq_length: int = 77,
+        **kwargs,
+    ):


With this refactoring, it looks to me like encode_inputs could be made the same as call while preserving backward compatibility (if the args are handled correctly), so maybe we could just use call here?

~~Good catch! Done.~~

I tried this in commit 0270034, took some time until I realized that there is a big difference in the two calls:

encode_inputs is calling OneFormerImageProcessor.encode_inputs

__call__ is calling OneFormerImageProcessor.__call__, which will eventually call also OneFormerImageProcessor.encode_inputs, but only after quite some checks and massaging.

So atm I don't think there is a simple and elegant way to replace and would just leave as is. WDYT?

Oh I see, never mind then

yonigozlan · 2024-11-27T16:29:01Z

src/transformers/models/oneformer/processing_oneformer.py

+    def __call__(
+        self,
+        images: Optional[ImageInput] = None,
+        *args,  # to be deprecated


Yes, see SAM PR comment #34578 (comment)

yonigozlan · 2024-11-27T16:30:13Z

src/transformers/models/oneformer/processing_oneformer.py

+    @staticmethod
+    def _add_args_for_backward_compatibility(args):
+        """
+        Remove this function once support for args is dropped in __call__
+        """
+        if len(args) > 2:
+            raise ValueError("Too many positional arguments")
+        return dict(zip(("task_inputs", "segmentation_maps"), args))


not needed #34578 (comment)

yonigozlan · 2024-11-27T16:31:26Z

src/transformers/models/oneformer/processing_oneformer.py

@@ -76,13 +100,46 @@ def _preprocess_text(self, text_list=None, max_length=77):
        token_inputs = torch.cat(token_inputs, dim=0)
        return token_inputs

-    def __call__(self, images=None, task_inputs=None, segmentation_maps=None, **kwargs):


These should be added to the optional_call_args attribute (see udop processor)

yonigozlan

There's a breaking change in the init that needs to be fixed. Otherwise it is looking good!
Just like for SAM, you'll also have to add the ProcessorTesterMixin to the processor test class, make sure all the tests pass and override the, if needed, and rebase on main.

yonigozlan · 2024-12-16T15:53:35Z

src/transformers/models/oneformer/processing_oneformer.py

+    def encode_inputs(
+        self,
+        images=None,
+        task_inputs=None,
+        segmentation_maps=None,
+        max_seq_length: int = 77,
+        task_seq_length: int = 77,
+        **kwargs,
+    ):


Oh I see, never mind then

yonigozlan · 2024-12-16T16:00:32Z

src/transformers/models/oneformer/processing_oneformer.py

    def __init__(
-        self, image_processor=None, tokenizer=None, max_seq_length: int = 77, task_seq_length: int = 77, **kwargs
+        self,
+        image_processor=None,
+        tokenizer=None,
    ):


Sorry I hadn't caught that before, but removing max_seq_length and task_seq_length from the init looks like a breaking change. We should add them back, and use self.max_seq_length and self.task_seq_length in the processing when the "max_length" kwarg is not defined. No need to add max_seq_length and task_seq_length to OneFormerTextKwargs as they weren't accepted before.

yonigozlan · 2024-12-16T16:03:00Z

src/transformers/models/oneformer/processing_oneformer.py

+            text_kwargs = {}
+        tokens = self.tokenizer(
+            text_list,
+            max_length=max_length if max_length is not None else text_kwargs.get("max_length", 77),


I would say the opposite:

Suggested change

max_length=max_length if max_length is not None else text_kwargs.get("max_length", 77),

max_length=text_kwargs.get("max_length") if text_kwargs.get("max_length") is not None else max_length,

So that the max_length kwarg overrides the default kwarg when it is specified

yonigozlan · 2024-12-16T16:03:23Z

src/transformers/models/oneformer/processing_oneformer.py

+class OneFormerTextKwargs(TextKwargs):
+    max_seq_length: int
+    task_seq_length: int


No need for that (see comments after)

yonigozlan · 2024-12-16T16:05:25Z

src/transformers/models/oneformer/processing_oneformer.py

+            padding=text_kwargs.get("padding", "max_length"),
+            truncation=text_kwargs.get("truncation", True),


No need to redefine the defaults kwargs here in the get, as if they are not specified, they will necessarily be "max_length" and True respectively, you can just have:

Suggested change

padding=text_kwargs.get("padding", "max_length"),

truncation=text_kwargs.get("truncation", True),

padding=text_kwargs.get("padding"),

truncation=text_kwargs.get("truncation"),

I believe this is needed because encode_inputs does not pass in any dict, so text_kwargs will become {}, without default.

Oh yes I see

yonigozlan · 2024-12-16T16:06:07Z

src/transformers/models/oneformer/processing_oneformer.py

+            task_token_inputs.append(task_input)
+        encoded_inputs["task_inputs"] = self._preprocess_text(
+            task_token_inputs,
+            max_length=output_kwargs["text_kwargs"]["task_seq_length"],


Suggested change

max_length=output_kwargs["text_kwargs"]["task_seq_length"],

max_length=self.task_seq_length,

yonigozlan · 2024-12-16T16:06:35Z

src/transformers/models/oneformer/processing_oneformer.py

+            text_inputs = [
+                self._preprocess_text(
+                    texts,
+                    max_length=output_kwargs["text_kwargs"]["max_seq_length"],


Suggested change

max_length=output_kwargs["text_kwargs"]["max_seq_length"],

max_length=self.max_seq_length,

tibor-reiss · 2024-12-19T19:01:20Z

@yonigozlan I had to repeat some of the tests, e.g. test_structured_kwargs_nested, due to the mandatory task_inputs. It was either this, or making a default for task_inputs, e.g. 'semantic'. I went with the former, because it might be surprising for the users that the 'semantic' is picked as default - currently it raises a ValueError if not specified.

yonigozlan

LGTM after putting the kwargs back in the init! Thanks for iterating on this :)

yonigozlan · 2024-12-20T19:36:44Z

src/transformers/models/oneformer/processing_oneformer.py

+        image_processor=None,
+        tokenizer=None,
+        max_seq_length: int = 77,
+        task_seq_length: int = 77,


Nit removing the **kwargs here would also be a breaking change, even if it's not used anywhere.

yonigozlan · 2024-12-20T19:40:58Z

@yonigozlan I had to repeat some of the tests, e.g. test_structured_kwargs_nested, due to the mandatory task_inputs. It was either this, or making a default for task_inputs, e.g. 'semantic'. I went with the former, because it might be surprising for the users that the 'semantic' is picked as default - currently it raises a ValueError if not specified.

Yes I think that's the best way to do it :)

tibor-reiss force-pushed the fix-31811-oneformer branch 2 times, most recently from cd15539 to fad4111 Compare October 31, 2024 21:12

molbap reviewed Nov 1, 2024

View reviewed changes

qubvel added Vision Multimodal Processing labels Nov 1, 2024

qubvel mentioned this pull request Nov 2, 2024

Uniform kwargs for processors #31911

Open

40 tasks

tibor-reiss requested a review from molbap November 19, 2024 10:18

ArthurZucker requested a review from yonigozlan November 19, 2024 11:38

yonigozlan reviewed Nov 27, 2024

View reviewed changes

tibor-reiss force-pushed the fix-31811-oneformer branch 2 times, most recently from 40cf888 to cbddaf4 Compare November 29, 2024 09:58

tibor-reiss requested a review from yonigozlan December 13, 2024 21:43

yonigozlan reviewed Dec 16, 2024

View reviewed changes

tibor-reiss added 11 commits December 19, 2024 13:48

Nitpick

89c3f3d

Make kwargs uniform for OneFormer

1f9989c

Add support for backwards compatibility

ef598fc

Better validation

8612f13

Refactor max_seq_length and task_seq_length

3b66378

Linting

cc27374

Code reivew - use existing methods

b5ba1a6

Order matters - task_inputs has to be first

a2c9fee

Pass in text_kwargs

57310b5

Add back max_seq_length and task_seq_length

ccde043

Add ProcessorTesterMixin

a63946f

tibor-reiss force-pushed the fix-31811-oneformer branch from 3c7724d to 9ccaa40 Compare December 19, 2024 18:49

Fix tests

b579834

tibor-reiss force-pushed the fix-31811-oneformer branch from 9ccaa40 to b579834 Compare December 19, 2024 18:51

tibor-reiss requested a review from yonigozlan December 19, 2024 18:58

yonigozlan reviewed Dec 20, 2024

View reviewed changes

yonigozlan requested a review from ArthurZucker December 20, 2024 19:40


		def _preprocess_text(self, text_list: PreTokenizedInput, max_length: int = 77):

	max_length=max_length if max_length is not None else text_kwargs.get("max_length", 77),
	max_length=text_kwargs.get("max_length") if text_kwargs.get("max_length") is not None else max_length,

		padding=text_kwargs.get("padding", "max_length"),
		truncation=text_kwargs.get("truncation", True),

	max_length=output_kwargs["text_kwargs"]["task_seq_length"],
	max_length=self.task_seq_length,

	max_length=output_kwargs["text_kwargs"]["max_seq_length"],
	max_length=self.max_seq_length,

uniformize kwargs for OneFormer #34547

Are you sure you want to change the base?

uniformize kwargs for OneFormer #34547

Conversation

tibor-reiss commented Oct 31, 2024 • edited Loading

molbap left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tibor-reiss Nov 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ArthurZucker commented Nov 19, 2024

yonigozlan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tibor-reiss Nov 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yonigozlan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tibor-reiss commented Dec 19, 2024

yonigozlan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yonigozlan commented Dec 20, 2024

tibor-reiss commented Oct 31, 2024 •

edited

Loading

tibor-reiss Nov 1, 2024 •

edited

Loading

tibor-reiss Nov 29, 2024 •

edited

Loading