Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added segmentation maps support for DPT image processor #34345

Open
wants to merge 424 commits into
base: main
Choose a base branch
from

Conversation

simonreise
Copy link

@simonreise simonreise commented Oct 23, 2024

Added segmentation maps support for DPT image processor

Most of image processors for vision models that support semantic segmentation task accept images and segmentation_maps as inputs, but for some reason DPT image processor does not process segmentation maps, only images. This PR can make code that one uses for training or evaluation of semantic segmentation models more reusable, as now DPT image processor can process segmentation maps as most of other image processors do.

I also added do_reduce_labels arg because other image processors that support segmentation masks use it.

I added two new tests: one that tests segmentation_masks support and one that tests if do_reduce_labels work as expected.

Most of the code is adapted from BEIT image processor.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@amyeroberts, @qubvel

@LysandreJik
Copy link
Member

cc @molbap as well in case bandwidth permits

Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - just a small refactor of the method to be more aligned with existing models!


def test_call_segmentation_maps(self):
# Initialize image_processing
image_processing = self.image_processing_class(**self.image_processor_dict)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, image_processor would be better

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed image_processing to image_processor. Should I also rename it in the other tests?

Comment on lines 462 to 504
if segmentation_maps is not None:
segmentation_maps = [to_numpy_array(segmentation_map) for segmentation_map in segmentation_maps]

# Add channel dimension if missing - needed for certain transformations
if segmentation_maps[0].ndim == 2:
added_channel_dim = True
segmentation_maps = [segmentation_map[None, ...] for segmentation_map in segmentation_maps]
input_data_format = ChannelDimension.FIRST
else:
added_channel_dim = False
if input_data_format is None:
input_data_format = infer_channel_dimension_format(segmentation_maps[0], num_channels=1)

if do_reduce_labels:
segmentation_maps = [self.reduce_label(segmentation_map) for segmentation_map in segmentation_maps]

if do_resize:
segmentation_maps = [
self.resize(
image=segmentation_map,
size=size,
resample=resample,
keep_aspect_ratio=keep_aspect_ratio,
ensure_multiple_of=ensure_multiple_of,
input_data_format=input_data_format,
)
for segmentation_map in segmentation_maps
]

if do_pad:
segmentation_maps = [
self.pad_image(
image=segmentation_map, size_divisor=size_divisor, input_data_format=input_data_format
)
for segmentation_map in segmentation_maps
]

# Remove extra channel dimension if added for processing
if added_channel_dim:
segmentation_maps = [segmentation_map.squeeze(0) for segmentation_map in segmentation_maps]
segmentation_maps = [segmentation_map.astype(np.int64) for segmentation_map in segmentation_maps]

data["labels"] = segmentation_maps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect - if there isn't any difference with Beit, can this be wrapped in a _preprocess_segmentation_map() method in a loop, that can be flagged as # Copied from ... the beit image processor?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapped segmentation map preprocessing code to _preprocess_segmentation_map(), and also moved image preprocessing to separate _preprocess_image() function and general preprocessing functionality to _preprocess() function.

@simonreise simonreise requested a review from molbap November 11, 2024 15:16
@simonreise
Copy link
Author

Could you please re-review the pull request? In the last commit I made all the changes you asked for: wrapped segmentation map preprocessing code to separate functions, added comments and renamed a variable in tests. Do I need to make any other changes to the code?

@molbap
Copy link
Contributor

molbap commented Nov 18, 2024

hey @simonreise , will review in a moment, we were all at a team gathering last week hence the inactivity. On my radar!

@ArthurZucker ArthurZucker requested review from yonigozlan and removed request for molbap November 19, 2024 11:42
@ArthurZucker
Copy link
Collaborator

@molbap you are fobidden to work for this week 🤣 go and rest, @yonigozlan will have a look! 🤗

ydshieh and others added 20 commits November 19, 2024 17:48
* Added image-text-to-text pipeline to task guide

* Update docs/source/en/tasks/image_text_to_text.md

Co-authored-by: Steven Liu <[email protected]>

* Update docs/source/en/tasks/image_text_to_text.md

Co-authored-by: Steven Liu <[email protected]>

* Update docs/source/en/tasks/image_text_to_text.md

Co-authored-by: Steven Liu <[email protected]>

* Update docs/source/en/tasks/image_text_to_text.md

Co-authored-by: Steven Liu <[email protected]>

* Merge codeblocks

---------

Co-authored-by: Steven Liu <[email protected]>
* try

* tryagain

* tryagggain

* translated

* translated2

* Update docs/source/zh/attention.md

Co-authored-by: Huazhong Ji <[email protected]>

---------

Co-authored-by: Huazhong Ji <[email protected]>
* fix

* higher max positions in tests
* Fix torch.export issue in dpt based models

Signed-off-by: Phillip Kuznetsov <[email protected]>

* Simplify the if statements

Signed-off-by: Phillip Kuznetsov <[email protected]>

* Move activation definitions of zoe_depth to init()

Signed-off-by: Phillip Kuznetsov <[email protected]>

* Add test_export for dpt and zoedepth

Signed-off-by: Phillip Kuznetsov <[email protected]>

* add depth anything

Signed-off-by: Phillip Kuznetsov <[email protected]>

* Remove zoedepth non-automated zoedepth changes and zoedepth test

Signed-off-by: Phillip Kuznetsov <[email protected]>

* [run_slow] dpt, depth_anything, zoedepth

Signed-off-by: Phillip Kuznetsov <[email protected]>

---------

Signed-off-by: Phillip Kuznetsov <[email protected]>
* Do not load for meta device

* Make some minor improvements

* Add test

* Update tests/utils/test_modeling_utils.py

Update test parameters

Co-authored-by: Marc Sun <[email protected]>

* Make the test simpler

---------

Co-authored-by: Marc Sun <[email protected]>
* weights only compability

* better tests from code review

* ping torch version

* add weights_only check
* Fix hyperparameter search when optuna+deepseed

* Adding free_memory to the search setup

---------

Co-authored-by: Corentin-Royer <[email protected]>
* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>
* add tests for 3 more vlms

* fix fuyu back

* skip test
* Add Nemotron GGUF Loading Support

* fix the Nemotron architecture assignation

---------

Co-authored-by: Marc Sun <[email protected]>
* add tensor processing system to separate logic for models

* format refactoring

* small fix

* make some methods private

* move custom methods to processors

* refactor tensor processing

* format fix
* skip nested deepspeed.zero.Init call

* make fixup

* solve conflict

* solve conflict

* put back local

* use context mangers instead of local thread

* Skip recursive calls to deepspeed.zero.Init

* Skip recursive calls to deepspeed.zero.Init

* back to old notebooks

* make style
* fix heuristic schedule

* fix style

* fix format
* Create modular_starcoder2.py

* Update modular_starcoder2.py

* update

* finalize modular

* revert # no-unravel

* Add support

* style

* Update modular_model_converter.py

* update docstring
ArthurZucker and others added 2 commits December 19, 2024 17:05
* remove fa2 test

* remove other failing tests

* style
* fix ForSequenceClassification

* unmodularize rope layer

* fix linting warning

* Avoid complex PoolingHead, only one prediction head needed

---------

Co-authored-by: Tom Aarsen <[email protected]>
@yonigozlan
Copy link
Member

Thanks, but the # Copied from statement must be placed above the function definition. You can refer to other parts of the library to see how it's done.
This is not just for information purposes; it enables the make fix-copies CLI command to propagate any modifications in the original function to its copied versions.

After making the required changes, you can ensure everything is in order by running the make fixup command.

tomaarsen and others added 19 commits December 19, 2024 14:45
…cript (huggingface#35347)

Add link to ModernBERT Text Classification GLUE finetuning script
* added expanded attention/padding masks prior to indexing the hidden_states

* consistency fix in WavLMForSequenceClassification

---------

Co-authored-by: Nikos Antoniou <[email protected]>
* fixup mamba2 - caching and several other small fixes

* fixup cached forward

* correct fix this time

* fixup cache - we do not need to extend the attn mask it's handled by generate (gives total ids + mask at each step)

* remove unnecessary (un)squeeze

* fixup cache position

* simplify a few things

* [run-slow] mamba2

* multi gpu attempt two

* [run-slow] mamba2

* [run-slow] mamba2

* [run-slow] mamba2

* [run-slow] mamba2

* add newer slow path fix

* [run-slow] mamba2
)

* init vptq

* add integration

* add vptq support

fix readme

* add tests && format

* format

* address comments

* format

* format

* address comments

* format

* address comments

* remove debug code

* Revert "remove debug code"

This reverts commit ed3b3ea.

* fix test

---------

Co-authored-by: Yang Wang <[email protected]>
* reduce 1

* reduce 1

---------

Co-authored-by: ydshieh <[email protected]>
…ngface#34931)

* Add AsyncTextIteratorStreamer class

* export AsyncTextIteratorStreamer

* export AsyncTextIteratorStreamer

* improve docs

* missing import

* missing import

* doc example fix

* doc example output fix

* add pytest-asyncio

* first attempt at tests

* missing import

* add pytest-asyncio

* fallback to wait_for and raise TimeoutError on timeout

* check for TimeoutError

* autodoc

* reorder imports

* fix style

---------

Co-authored-by: Arthur Zucker <[email protected]>
Co-authored-by: Arthur <[email protected]>
* cleaner attention interfaces

* correctly set the _attn_implementation when adding other functions to it

* update

* Update modeling_utils.py

* CIs
feat: add parallel support for qwen2vl
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>
…ce#35291)

* bugfix: torch.export failure caused by `_make_causal_mask`

Recent changes in torch dynamo prevent mutations on tensors converted with aten::_to_copy. To address this, we can clone such tensor before performing in-place operation `masked_fill_` only when the code is being compiled by torch dynamo.
(relevant issue: pytorch/pytorch#127571)

* chore: use `is_torchdynamo_compiling` instead of `torch._dynamo.is_compiling`
* update codecarbon

* replace directly-specified-test-dirs with tmp_dir

* Revert "replace directly-specified-test-dirs with tmp_dir"

This reverts commit 310a6d9.

* revert the change of .gitignore

* Update .gitignore

---------

Co-authored-by: Yih-Dar <[email protected]>
* [test-all]

* style

* [test-all]

* [test_all]

* [test_all]

* style
…4995)

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>
* Improve modular transformers documentation

- Adds hints to general contribution guides
- Lists which utils scripts are available to generate single-files from modular files and check their content

* Show commands in copyable code cells

---------

Co-authored-by: Joel Koch <[email protected]>
* Improved Documentation Of Audio Classification

* Updated documentation as per review

* Updated audio_classification.md

* Update audio_classification.md
@yonigozlan
Copy link
Member

Thanks for iterating! you just have to rebase on main and check that the tests are still passing, then LGTM!

bastrob and others added 7 commits December 21, 2024 08:51
* owlvit/2 dynamic input resolution.

* adapt box grid to patch_dim_h patch_dim_w

* fix ci

* clarify variable naming

* clarify variable naming..

* compute box_bias dynamically inside box_predictor

* change style part of code

* [run-slow] owlvit, owlv2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.