Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fall back to slow image processor in ImageProcessingAuto when no fast processor available #34785

Merged
merged 7 commits into from
Dec 15, 2024

Conversation

yonigozlan
Copy link
Member

@yonigozlan yonigozlan commented Nov 18, 2024

What does this PR do?

Refactor parts of image_processing_auto to fall back on slow processor when use_fast is set to True and no fast processor is available.

Before, this would throw an error:

processor = AutoImageProcessor.from_pretrained("Salesforce/blip-image-captioning-large", use_fast=True)

Now the following warning is displayed

`use_fast` is set to `True` but the image processor class does not have a fast version. Falling back to the slow version.

Also add warnings to start rolling out fast image processor by default (goal is v4.48). If use_fast is not set, and the checkpoint was saved with a slow processor, display:

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@@ -246,6 +246,7 @@ def preprocess(
image_mean = image_mean if image_mean is not None else self.image_mean
image_std = image_std if image_std is not None else self.image_std
size = size if size is not None else self.size
return_tensors = "pt" if return_tensors is None else return_tensors
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, the fast vit image processor will crash in the default behavior (when return_tensors is not specified). This is now a bigger problem with fast image processors used by default.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this kind of breaking change?

(I am also curious what if a user is using TF/Flax model while their environment has torch/torchvision installed. Is the fast image processor will be used by default and will return torch tensor by default ..?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, yes this might be a problem. In general, I'm not too sure why it was decided to constrain fast image processors to output only torch tensors. Would be glad to know if there was a reason for that, otherwise it might be something we would want to reconsider.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I am not even sure TF is used for image models! Fine by me!

@qubvel
Copy link
Member

qubvel commented Nov 18, 2024

Hi @yonigozlan! Thanks for opening a PR! Please check this comment regarding enabling use_fast=True by default. The main concern was that fast and slow image processors might be not equivalent, so instead of forcing fast image processors we notify users that fast image processor is available. WDYT?

@yonigozlan
Copy link
Member Author

Oh, I hadn't seen this, thanks for pointing it out @qubvel!
Indeed there are tiny differences between fast and slow image processors outputs, which could be confusing for users as it will result in minor differences in model outputs (not necessarily worse though). My main concern is that most users will ignore or won't notice the warning as nothing is going clearly wrong during inference, but they will miss out on using fast processors with our current vision models. Since slow processors are a major speed bottleneck for many of them, this seems like a missed opportunity. Also it looks like there are currently no way to use fast processors in pipelines, whereas this PR would enable their used there too (I still have to check this).

I guess it comes down to prioritizing backward compatibility or inference speed. Maybe we could add a warning_once in this PR, stating that while a slow processor was saved, Transformers v4.x automatically loads fast processors, so users might see minor output differences, and insist on the gains in inference speed. Something like:

A slow processor was saved with this model. However, since Transformers v4.x, we automatically load fast processors for improved inference speed. 
This may result in minor differences in outputs. To force the use of a slow processor, set `use_fast=False`.

This way, users are aware of the potential difference but also understand the benefit, and they can still choose to use the slow processor if needed.

Happy to discuss all of this @qubvel @ydshieh @molbap.

Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Have some comments as a first review.

src/transformers/models/auto/image_processing_auto.py Outdated Show resolved Hide resolved
@@ -246,6 +246,7 @@ def preprocess(
image_mean = image_mean if image_mean is not None else self.image_mean
image_std = image_std if image_std is not None else self.image_std
size = size if size is not None else self.size
return_tensors = "pt" if return_tensors is None else return_tensors
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this kind of breaking change?

(I am also curious what if a user is using TF/Flax model while their environment has torch/torchvision installed. Is the fast image processor will be used by default and will return torch tensor by default ..?)

@@ -863,6 +863,7 @@ def preprocess(
input_data_format = infer_channel_dimension_format(images[0])
if input_data_format == ChannelDimension.LAST:
images = [image.permute(2, 0, 1).contiguous() for image in images]
input_data_format = ChannelDimension.FIRST
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is a bug and here you fix it right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it was, sorry for the lack of explanation. It's actually now fixed by another PR

Copy link
Member

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fast image processing is indeed a compelling feature that could benefit many users. It looks similar to setting the default to SDPA over eager attention. While I haven’t noticed complaints about making SDPA the default, there were issues with CI breaking in different projects after SDPA was added to popular models like CLIP and Siglip.

For users experiencing slow processing speeds, exploring the option to set use_fast=True might be worthwhile. However, for those not facing performance challenges, it may not be necessary.

I don’t have strong objections to this feature, but I suggest the following approach to implement it effectively:

  1. Add tests for thorough comparison between Slow and Fast image processors to ensure nothing breaks, especially for custom processors with non-default parameters. Tests should cover all Slow and Fast image processors (maybe a common tests?).

  2. Inform users well ahead of time about the upcoming behavior change. For instance, announce that Fast image processors will become the default in 3–5 releases. Users who wish to retain the current behavior should explicitly set use_fast=False in their code.

This gradual rollout can minimize potential issues while allowing users time to adapt. What do you think?

@yonigozlan
Copy link
Member Author

@qubvel Agreed it might be better to do a progressive rollout and add more base tests in the meantime. I can change this PR to do that and also keep some of the refactoring of image_processing_auto in this PR as I feel there was some bits of confusing code (especially regarding not separating image_processor_type and image_processor_class).
I can also add support for fast image processors to output other format than torch tensors, unless there is any objection against that.

@ydshieh
Copy link
Collaborator

ydshieh commented Nov 19, 2024

I can also add support for fast image processors to output other format than torch tensors, unless there is any objection against that.

Considering the usage of TF/Flax, I am not sure if it is worth the effort (if it will require more than expected effort). You can talk to core maintainers about this part.

@qubvel
Copy link
Member

qubvel commented Nov 19, 2024

@qubvel Agreed it might be better to do a progressive rollout and add more base tests in the meantime. I can change this PR to do that and also keep some of the refactoring of image_processing_auto in this PR as I feel there was some bits of confusing code (especially regarding not separating image_processor_type and image_processor_class).

@yonigozlan sounds great!

@ArthurZucker
Copy link
Collaborator

Inform users well ahead of time about the upcoming behavior change. For instance, announce that Fast image processors will become the default in 3–5 releases. Users who wish to retain the current behavior should explicitly set use_fast=False in their code.

I think we should start rolling this out (the warnings) and reduce to 2 releases for example (2 months) giving us time to tests!

Absolutely no need to work on TF / Jax versions for now either! ➕

@yonigozlan yonigozlan changed the title Use fast image processors by default with ImageProcessingAuto Fall back to slow image processor in ImageProcessingAuto when no fast processor available Nov 29, 2024
@yonigozlan
Copy link
Member Author

@ArthurZucker Changed the scope of this PR to fix the error when use_fast is set to True and no fast image processor is available. Also added warnings to start rolling out fast image processors by default.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, yep let's make sure 4.48 has fast by default!

Comment on lines 501 to 519
if use_fast:
if not image_processor_type.endswith("Fast"):
image_processor_type += "Fast"
image_processor_class = get_image_processor_class_from_name(image_processor_type)
if image_processor_class is None:
logger.warning_once(
"`use_fast` is set to `True` but the image processor class does not have a fast version. "
" Falling back to the slow version."
)
image_processor_class = get_image_processor_class_from_name(image_processor_type[:-4])
else:
image_processor_type = (
image_processor_type[:-4] if image_processor_type.endswith("Fast") else image_processor_type
)
image_processor_class = get_image_processor_class_from_name(image_processor_type)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be simplified a bit to check first if image_processor_type + "Fast" is in the mapping, if yes we take, if no we don't. only need to call get_image_processor_class_from_name once

@@ -246,6 +246,7 @@ def preprocess(
image_mean = image_mean if image_mean is not None else self.image_mean
image_std = image_std if image_std is not None else self.image_std
size = size if size is not None else self.size
return_tensors = "pt" if return_tensors is None else return_tensors
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I am not even sure TF is used for image models! Fine by me!

@yonigozlan yonigozlan merged commit 5615a39 into huggingface:main Dec 15, 2024
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants