-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optimized PixtralImageProcessorFast
#34836
Add optimized PixtralImageProcessorFast
#34836
Conversation
PixtralImageProcessorFast
Hi @mgoin! Sounds great! Thanks for working on this 🤗 cc @yonigozlan maybe if you have bandwidth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @mgoin for working on this! Looks great to me, just mentioned some minor things to fix.
To be transparent, the current plan for fast image processors is to make a strong BaseImageProcessorFast and several image processing Mixins, in order to make adding new fast image processor much simpler. All that to say that this processor might change again in the future. Meanwhile, I think it would be great to have as is, because as you can see it makes a huge performance difference, and this fast image processor in particular doesn't require a huge diff (compared to the DETR ones for example).
Also, this would be the first fast image processor used in a processor, and I don't think there is a mechanism to use it with AutoProcessor yet (I might be wrong), so this is also something that will need to be added soon.
But I'm curious then to know how you are using it with a PixtralProcessor, I'm guessing you manually instantiate it with:
from transformers import PixtralProcessor, AutoImageProcessor, AutoTokenizer
fast_image_processor = AutoImageProcessor.from_pretrained("mistral-community/pixtral-12b", use_fast=True)
tokenizer = AutoTokenizer.from_pretrained("mistral-community/pixtral-12b")
processor = PixtralProcessor(fast_image_processor, tokenizer)
Is that correct?
Thanks again!
src/transformers/models/pixtral/image_processing_pixtral_fast.py
Outdated
Show resolved
Hide resolved
src/transformers/models/pixtral/image_processing_pixtral_fast.py
Outdated
Show resolved
Hide resolved
Thanks for the review and context @yonigozlan ! I will look into it later today. Yes you are correct about using it within a Processor, however I have tested this works within vLLM simply by adding One bug I noticed is that if I specify |
Oh great news that it already works with AutoProcessor. As I said this is the first fast image processor used in a processor so it was not guaranteed :).
Yes this is the same right now when using ImageProcessingAuto. I don't think it should be that way though, especially as more and more people will want to use fast image processors by default. I'll open a PR to fix this. Current plan is:
The deprecation cycle is needed as there are slight differences in outputs when using torchvision vs PIL, see this PR #34785 for more info. |
…m/mgoin/transformers into add-image-processing-fast-pixtral
…m/mgoin/transformers into add-image-processing-fast-pixtral
Feel free to ping us for another round of review! 🚀 |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Thanks @yonigozlan and @ArthurZucker - this PR is ready for more review! I believe the failing test is unrelated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating!
LGTM after adding a get_resize_output_image_size
with torch instead of numpy
from .image_processing_pixtral import ( | ||
BatchMixFeature, | ||
convert_to_rgb, | ||
get_resize_output_image_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it would be better to rewrite this function with torch.ceil
instead of np.ceil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed your comment response. Since height
, width
, and ratio
are python scalars it doesn't make sense to use torch here. I can replace the np.ceil
with just math.ceil
though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just saw this, indeed my mistake :)
Hi @yonigozlan what do you think about the use of |
Sorry to be annoying with this but I'd prefer |
Sorry if I am misunderstanding something @yonigozlan but Are you suggesting that I wrap the values in a Tensor just to extract them back out as primitives? This is inefficient and unneeded to work with height = int(np.ceil(height / ratio))
# versus
height = int(torch.ceil(torch.tensor(height / ratio)).item()) Considering |
Hi @mgoin, |
Nice thanks for clarification! Hi @ArthurZucker would you mind signing off? |
Yep having a look! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, sorry for the delay! My main question is for @yonigozlan , do we have compile tests for image processor fast?
Thanks a lot @mgoin for this contribution! 🔥 |
I don't think so, I'll open a PR for that! |
Cool thanks! |
* Add optimized PixtralImageProcessorFast * make style * Add dummy_vision_object * Review comments * Format * Fix dummy * Format * np.ceil for math.ceil
* Add optimized PixtralImageProcessorFast * make style * Add dummy_vision_object * Review comments * Format * Fix dummy * Format * np.ceil for math.ceil
What does this PR do?
This PR implements a fast image processor for Pixtral. Follows issue #33810.
The key acceleration comes from replacing Pillow/Numpy tensors and functions (resize, rescale, normalize) with torch tensors and torchvisionv2 functions. It comes along with support for
torch.compile
and passingdevice="cuda"
during inference to process the input on GPU. One limitation is that onlyreturn_tensors="pt"
will be supported.Usage
From simple benchmarking with a single image of size
[3, 876, 1300]
, I see 6x to 10x speedupBefore submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.