Added `segmentation maps` support for DPT image processor #34345

simonreise · 2024-10-23T12:21:25Z

Added `segmentation maps` support for DPT image processor

Most of image processors for vision models that support semantic segmentation task accept images and segmentation_maps as inputs, but for some reason DPT image processor does not process segmentation maps, only images. This PR can make code that one uses for training or evaluation of semantic segmentation models more reusable, as now DPT image processor can process segmentation maps as most of other image processors do.

I also added do_reduce_labels arg because other image processors that support segmentation masks use it.

I added two new tests: one that tests segmentation_masks support and one that tests if do_reduce_labels work as expected.

Most of the code is adapted from BEIT image processor.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@amyeroberts, @qubvel

LysandreJik · 2024-10-24T12:35:34Z

cc @molbap as well in case bandwidth permits

molbap

LGTM - just a small refactor of the method to be more aligned with existing models!

molbap · 2024-10-24T14:15:29Z

tests/models/dpt/test_image_processing_dpt.py

+
+    def test_call_segmentation_maps(self):
+        # Initialize image_processing
+        image_processing = self.image_processing_class(**self.image_processor_dict)


nit, image_processor would be better

Renamed image_processing to image_processor. Should I also rename it in the other tests?

molbap · 2024-10-24T14:20:55Z

src/transformers/models/dpt/image_processing_dpt.py

+        if segmentation_maps is not None:
+            segmentation_maps = [to_numpy_array(segmentation_map) for segmentation_map in segmentation_maps]
+
+            # Add channel dimension if missing - needed for certain transformations
+            if segmentation_maps[0].ndim == 2:
+                added_channel_dim = True
+                segmentation_maps = [segmentation_map[None, ...] for segmentation_map in segmentation_maps]
+                input_data_format = ChannelDimension.FIRST
+            else:
+                added_channel_dim = False
+                if input_data_format is None:
+                    input_data_format = infer_channel_dimension_format(segmentation_maps[0], num_channels=1)
+
+            if do_reduce_labels:
+                segmentation_maps = [self.reduce_label(segmentation_map) for segmentation_map in segmentation_maps]
+
+            if do_resize:
+                segmentation_maps = [
+                    self.resize(
+                        image=segmentation_map,
+                        size=size,
+                        resample=resample,
+                        keep_aspect_ratio=keep_aspect_ratio,
+                        ensure_multiple_of=ensure_multiple_of,
+                        input_data_format=input_data_format,
+                    )
+                    for segmentation_map in segmentation_maps
+                ]
+
+            if do_pad:
+                segmentation_maps = [
+                    self.pad_image(
+                        image=segmentation_map, size_divisor=size_divisor, input_data_format=input_data_format
+                    )
+                    for segmentation_map in segmentation_maps
+                ]
+
+            # Remove extra channel dimension if added for processing
+            if added_channel_dim:
+                segmentation_maps = [segmentation_map.squeeze(0) for segmentation_map in segmentation_maps]
+            segmentation_maps = [segmentation_map.astype(np.int64) for segmentation_map in segmentation_maps]
+
+            data["labels"] = segmentation_maps


Perfect - if there isn't any difference with Beit, can this be wrapped in a _preprocess_segmentation_map() method in a loop, that can be flagged as # Copied from ... the beit image processor?

Wrapped segmentation map preprocessing code to _preprocess_segmentation_map(), and also moved image preprocessing to separate _preprocess_image() function and general preprocessing functionality to _preprocess() function.

simonreise · 2024-11-11T15:16:49Z

Could you please re-review the pull request? In the last commit I made all the changes you asked for: wrapped segmentation map preprocessing code to separate functions, added comments and renamed a variable in tests. Do I need to make any other changes to the code?

molbap · 2024-11-18T16:30:46Z

hey @simonreise , will review in a moment, we were all at a team gathering last week hence the inactivity. On my radar!

ArthurZucker · 2024-11-19T11:43:19Z

@molbap you are fobidden to work for this week 🤣 go and rest, @yonigozlan will have a look! 🤗

fix Co-authored-by: ydshieh <[email protected]>

* Added image-text-to-text pipeline to task guide * Update docs/source/en/tasks/image_text_to_text.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/tasks/image_text_to_text.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/tasks/image_text_to_text.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/tasks/image_text_to_text.md Co-authored-by: Steven Liu <[email protected]> * Merge codeblocks --------- Co-authored-by: Steven Liu <[email protected]>

* try * tryagain * tryagggain * translated * translated2 * Update docs/source/zh/attention.md Co-authored-by: Huazhong Ji <[email protected]> --------- Co-authored-by: Huazhong Ji <[email protected]>

* fix * propagate * type check

* fix * higher max positions in tests

* Fix torch.export issue in dpt based models Signed-off-by: Phillip Kuznetsov <[email protected]> * Simplify the if statements Signed-off-by: Phillip Kuznetsov <[email protected]> * Move activation definitions of zoe_depth to init() Signed-off-by: Phillip Kuznetsov <[email protected]> * Add test_export for dpt and zoedepth Signed-off-by: Phillip Kuznetsov <[email protected]> * add depth anything Signed-off-by: Phillip Kuznetsov <[email protected]> * Remove zoedepth non-automated zoedepth changes and zoedepth test Signed-off-by: Phillip Kuznetsov <[email protected]> * [run_slow] dpt, depth_anything, zoedepth Signed-off-by: Phillip Kuznetsov <[email protected]> --------- Signed-off-by: Phillip Kuznetsov <[email protected]>

* Do not load for meta device * Make some minor improvements * Add test * Update tests/utils/test_modeling_utils.py Update test parameters Co-authored-by: Marc Sun <[email protected]> * Make the test simpler --------- Co-authored-by: Marc Sun <[email protected]>

* weights only compability * better tests from code review * ping torch version * add weights_only check

* Fix hyperparameter search when optuna+deepseed * Adding free_memory to the search setup --------- Co-authored-by: Corentin-Royer <[email protected]>

* fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <[email protected]>

* add tests for 3 more vlms * fix fuyu back * skip test

…`num_train_epochs` (huggingface#34810) Update trainer.py

* Add Nemotron GGUF Loading Support * fix the Nemotron architecture assignation --------- Co-authored-by: Marc Sun <[email protected]>

* add tensor processing system to separate logic for models * format refactoring * small fix * make some methods private * move custom methods to processors * refactor tensor processing * format fix

* skip nested deepspeed.zero.Init call * make fixup * solve conflict * solve conflict * put back local * use context mangers instead of local thread * Skip recursive calls to deepspeed.zero.Init * Skip recursive calls to deepspeed.zero.Init * back to old notebooks * make style

* fix heuristic schedule * fix style * fix format

* Create modular_starcoder2.py * Update modular_starcoder2.py * update * finalize modular * revert # no-unravel * Add support * style * Update modular_model_converter.py * update docstring

fix watermarking order

* remove fa2 test * remove other failing tests * style

* fix ForSequenceClassification * unmodularize rope layer * fix linting warning * Avoid complex PoolingHead, only one prediction head needed --------- Co-authored-by: Tom Aarsen <[email protected]>

yonigozlan · 2024-12-19T17:01:58Z

Thanks, but the # Copied from statement must be placed above the function definition. You can refer to other parts of the library to see how it's done.
This is not just for information purposes; it enables the make fix-copies CLI command to propagate any modifications in the original function to its copied versions.

After making the required changes, you can ensure everything is in order by running the make fixup command.

…cript (huggingface#35347) Add link to ModernBERT Text Classification GLUE finetuning script

* added expanded attention/padding masks prior to indexing the hidden_states * consistency fix in WavLMForSequenceClassification --------- Co-authored-by: Nikos Antoniou <[email protected]>

* fixup mamba2 - caching and several other small fixes * fixup cached forward * correct fix this time * fixup cache - we do not need to extend the attn mask it's handled by generate (gives total ids + mask at each step) * remove unnecessary (un)squeeze * fixup cache position * simplify a few things * [run-slow] mamba2 * multi gpu attempt two * [run-slow] mamba2 * [run-slow] mamba2 * [run-slow] mamba2 * [run-slow] mamba2 * add newer slow path fix * [run-slow] mamba2

) * init vptq * add integration * add vptq support fix readme * add tests && format * format * address comments * format * format * address comments * format * address comments * remove debug code * Revert "remove debug code" This reverts commit ed3b3ea. * fix test --------- Co-authored-by: Yang Wang <[email protected]>

* reduce 1 * reduce 1 --------- Co-authored-by: ydshieh <[email protected]>

…ngface#34931) * Add AsyncTextIteratorStreamer class * export AsyncTextIteratorStreamer * export AsyncTextIteratorStreamer * improve docs * missing import * missing import * doc example fix * doc example output fix * add pytest-asyncio * first attempt at tests * missing import * add pytest-asyncio * fallback to wait_for and raise TimeoutError on timeout * check for TimeoutError * autodoc * reorder imports * fix style --------- Co-authored-by: Arthur Zucker <[email protected]> Co-authored-by: Arthur <[email protected]>

* cleaner attention interfaces * correctly set the _attn_implementation when adding other functions to it * update * Update modeling_utils.py * CIs

feat: add parallel support for qwen2vl

…35011) fix zoe bug in deepspeed zero3

* fix * fix * fix --------- Co-authored-by: ydshieh <[email protected]>

…ce#35291) * bugfix: torch.export failure caused by `_make_causal_mask` Recent changes in torch dynamo prevent mutations on tensors converted with aten::_to_copy. To address this, we can clone such tensor before performing in-place operation `masked_fill_` only when the code is being compiled by torch dynamo. (relevant issue: pytorch/pytorch#127571) * chore: use `is_torchdynamo_compiling` instead of `torch._dynamo.is_compiling`

* update codecarbon * replace directly-specified-test-dirs with tmp_dir * Revert "replace directly-specified-test-dirs with tmp_dir" This reverts commit 310a6d9. * revert the change of .gitignore * Update .gitignore --------- Co-authored-by: Yih-Dar <[email protected]>

* [test-all] * style * [test-all] * [test_all] * [test_all] * style

…ce#35241) fix Co-authored-by: ydshieh <[email protected]>

…4995) * fix * fix * fix * fix * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <[email protected]>

* Improve modular transformers documentation - Adds hints to general contribution guides - Lists which utils scripts are available to generate single-files from modular files and check their content * Show commands in copyable code cells --------- Co-authored-by: Joel Koch <[email protected]>

* Improved Documentation Of Audio Classification * Updated documentation as per review * Updated audio_classification.md * Update audio_classification.md

example json

yonigozlan · 2024-12-20T19:32:14Z

Thanks for iterating! you just have to rebase on main and check that the tests are still passing, then LGTM!

* owlvit/2 dynamic input resolution. * adapt box grid to patch_dim_h patch_dim_w * fix ci * clarify variable naming * clarify variable naming.. * compute box_bias dynamically inside box_predictor * change style part of code * [run-slow] owlvit, owlv2

…ithub.com/simonreise/transformers into segmentation-maps-for-dpt-image-processor

molbap reviewed Oct 24, 2024

View reviewed changes

simonreise requested a review from molbap November 11, 2024 15:16

qubvel added Vision Processing labels Nov 18, 2024

ArthurZucker requested review from yonigozlan and removed request for molbap November 19, 2024 11:42

ydshieh and others added 20 commits November 19, 2024 17:48

Fix check_training_gradient_checkpointing (huggingface#34806)

469eddb

fix Co-authored-by: ydshieh <[email protected]>

Translate attention.md into Chinese (huggingface#34716)

3033509

* try * tryagain * tryagggain * translated * translated2 * Update docs/source/zh/attention.md Co-authored-by: Huazhong Ji <[email protected]> --------- Co-authored-by: Huazhong Ji <[email protected]>

LLaVA OV: fix unpadding precision (huggingface#34779)

145fbd4

* fix * propagate * type check

Fix low memory beam search (huggingface#34746)

9470d65

* fix * higher max positions in tests

Fix the memory usage issue of logits in generate() (huggingface#34813)

9d16441

Torchao weights only + prequantized compability (huggingface#34355)

67890de

* weights only compability * better tests from code review * ping torch version * add weights_only check

Fix hyperparameter search when optuna+deepseed (huggingface#34642)

bf42c3b

* Fix hyperparameter search when optuna+deepseed * Adding free_memory to the search setup --------- Co-authored-by: Corentin-Royer <[email protected]>

Fix CI by tweaking torchao tests (huggingface#34832)

3cb8676

Fix CI slack reporting issue (huggingface#34833)

40821a2

* fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <[email protected]>

VLMs: enable generation tests - last batch (huggingface#34484)

28fb02f

* add tests for 3 more vlms * fix fuyu back * skip test

Change logging level from warning to info for max_steps overriding …

d4e1acb

…`num_train_epochs` (huggingface#34810) Update trainer.py

Add Nemotron GGUF Loading Support (huggingface#34725)

c57eafd

* Add Nemotron GGUF Loading Support * fix the Nemotron architecture assignation --------- Co-authored-by: Marc Sun <[email protected]>

Improve gguf tensor processing (huggingface#34515)

ae5cbf8

* add tensor processing system to separate logic for models * format refactoring * small fix * make some methods private * move custom methods to processors * refactor tensor processing * format fix

Fix heuristic scheduling for UAG (huggingface#34805)

1887159

* fix heuristic schedule * fix style * fix format

Refactor StarCoder2 using modular (huggingface#34015)

4e90b99

* Create modular_starcoder2.py * Update modular_starcoder2.py * update * finalize modular * revert # no-unravel * Add support * style * Update modular_model_converter.py * update docstring

Watermarking: fix order (huggingface#34849)

6a912ff

fix watermarking order

ArthurZucker and others added 2 commits December 19, 2024 17:05

Fix some fa2 tests (huggingface#35340)

1fa807f

* remove fa2 test * remove other failing tests * style

Modernbert Release Fixes (huggingface#35344)

0ade1ca

* fix ForSequenceClassification * unmodularize rope layer * fix linting warning * Avoid complex PoolingHead, only one prediction head needed --------- Co-authored-by: Tom Aarsen <[email protected]>

tomaarsen and others added 19 commits December 19, 2024 14:45

[docs] Add link to ModernBERT Text Classification GLUE finetuning s…

f42084e

…cript (huggingface#35347) Add link to ModernBERT Text Classification GLUE finetuning script

fix onnx export of speech foundation models (huggingface#34224)

ff9141b

* added expanded attention/padding masks prior to indexing the hidden_states * consistency fix in WavLMForSequenceClassification --------- Co-authored-by: Nikos Antoniou <[email protected]>

Fixed # Copied from statements

d94832b

Reduce CircleCI usage (huggingface#35355)

b5a557e

* reduce 1 * reduce 1 --------- Co-authored-by: ydshieh <[email protected]>

Cleaner attention interfaces (huggingface#35342)

0d51d65

* cleaner attention interfaces * correctly set the _attn_implementation when adding other functions to it * update * Update modeling_utils.py * CIs

Add Tensor Parallel support for Qwen2VL (huggingface#35050)

c3a4359

feat: add parallel support for qwen2vl

fix zoedepth initialization error under deepspeed zero3 (huggingface#…

4567ee8

…35011) fix zoe bug in deepspeed zero3

Aurevoir PyTorch 1 (huggingface#35358)

05de764

* fix * fix * fix --------- Co-authored-by: ydshieh <[email protected]>

Update test fetcher when we want to test all (huggingface#35364)

6fae2a8

* [test-all] * style * [test-all] * [test_all] * [test_all] * style

Use weights_only=True with torch.load for transfo_xl (huggingfa…

0fc2970

…ce#35241) fix Co-authored-by: ydshieh <[email protected]>

Make test_generate_with_static_cache even less flaky (huggingface#3…

504c4d3

…4995) * fix * fix * fix * fix * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <[email protected]>

Improved Documentation Of Audio Classification (huggingface#35368)

94fe0b9

* Improved Documentation Of Audio Classification * Updated documentation as per review * Updated audio_classification.md * Update audio_classification.md

[docs] Follow up register_pipeline (huggingface#35310)

608e163

example json

bastrob and others added 7 commits December 21, 2024 08:51

Added segmentation_maps support for DPT image processor

0bff533

Added tests for dpt image processor

dd1a4e7

Moved preprocessing into separate functions

d4c2857

Added # Copied from statements

6f4e61e

Fixed # Copied from statements

b65774e

Merge branch 'segmentation-maps-for-dpt-image-processor' of https://g…

a30323c

…ithub.com/simonreise/transformers into segmentation-maps-for-dpt-image-processor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added `segmentation maps` support for DPT image processor #34345

Added `segmentation maps` support for DPT image processor #34345

simonreise commented Oct 23, 2024 •

edited

Loading

LysandreJik commented Oct 24, 2024

molbap left a comment

molbap Oct 24, 2024

simonreise Oct 29, 2024

molbap Oct 24, 2024

simonreise Oct 29, 2024

simonreise commented Nov 11, 2024

molbap commented Nov 18, 2024

ArthurZucker commented Nov 19, 2024

yonigozlan commented Dec 19, 2024

yonigozlan commented Dec 20, 2024

Added segmentation maps support for DPT image processor #34345

Are you sure you want to change the base?

Added segmentation maps support for DPT image processor #34345

Conversation

simonreise commented Oct 23, 2024 • edited Loading

Added segmentation maps support for DPT image processor

Before submitting

Who can review?

LysandreJik commented Oct 24, 2024

molbap left a comment

Choose a reason for hiding this comment

molbap Oct 24, 2024

Choose a reason for hiding this comment

simonreise Oct 29, 2024

Choose a reason for hiding this comment

molbap Oct 24, 2024

Choose a reason for hiding this comment

simonreise Oct 29, 2024

Choose a reason for hiding this comment

simonreise commented Nov 11, 2024

molbap commented Nov 18, 2024

ArthurZucker commented Nov 19, 2024

yonigozlan commented Dec 19, 2024

yonigozlan commented Dec 20, 2024

Added `segmentation maps` support for DPT image processor #34345

Added `segmentation maps` support for DPT image processor #34345

simonreise commented Oct 23, 2024 •

edited

Loading

Added `segmentation maps` support for DPT image processor