-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast image processor #28847
Fast image processor #28847
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
b9fbf93
to
3d9ec2d
Compare
9ceab5d
to
ffdadd2
Compare
e759b61
to
167ce1b
Compare
@@ -0,0 +1,542 @@ | |||
# coding=utf-8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just moving objects from image_processing_utils.py to this new file, which will share the common objects between the base fast and slow image processors in image_processing_utils.py and image_processing_utils_fast.py.
This is to match the structure of the tokenizers files
@@ -313,6 +322,10 @@ def from_pretrained(cls, pretrained_model_name_or_path, **kwargs): | |||
The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a | |||
git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any | |||
identifier allowed by git. | |||
use_fast (`bool`, *optional*, defaults to `False`): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaulting to false for now, so users have to actively opt-in, to avoid any surprises
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add a warning for supported models that the user is using a slow version (+ default to None, if None set to False and warn for model in the list of supported models)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! Added in f6a0847
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very thorough. I just left minor feedback.
Would be interesting to benchmark how does the speedup hold across higher resolutions and batch sizes (i.e., when we increase both batch size and resolution).
PIL = "pillow" | ||
TORCH = "torch" | ||
NUMPY = "numpy" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used to support TensorFlow, too, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the slow image processors we support tensorflow and jax as well. The usage of these frameworks compared to pytorch is tiny. As this uses a torchvision backend, I'm not sure how much sense it is to convert to torch tensors, run in pytorch and the convert back, as this would require TF users to install torchvision.
I've added an exception if jax or TF arrays are passed in.
We can add support in the future if we end up getting lots of requests for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, with the same interface but faster, and the code looks much easier!
I added some comments
# Regardless of whether we rescale, all PIL and numpy values need to be converted to a torch tensor | ||
# to keep cross compatibility with slow image processors | ||
convert_to_tensor = image_type in (ImageType.PIL, ImageType.NUMPY) | ||
if convert_to_tensor: | ||
transforms.append(ToTensor()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth writing a custom ToTensor
without rescaling logic to avoid the complex rules below? We can just copy-modify the original one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found PILToTensor
which thankfully does the conversion without scaling.
I added three custom transforms regarding this comment and your other comment about fusing the operations:
NumpyToTensor
: numpy equivalent forPILToTensor
Rescale
: class which will rescale pixel values byrescale_factor
FusedRescaleNormalize
: which will do as you suggested and combine the rescale and normalize operations. Only tricky thig here is we need to make sure to convert to a floating tensor first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing with and without the fused normalize and rescale, the speed ups are between 2 - 13 %
With separate normaliza and rescale
List of PIL image input
Fast mean time: 0.006879540920257568 std: 0.00035099185559956325
Slow mean time: 0.010017627239227295 std: 0.00016269613600058844
1.4601 times faster
Torch tensor batch input
Fast mean time: 0.0017743961811065674 std: 6.134420692941824e-05
Slow mean time: 0.05170198488235474 std: 0.0006948744101280261
29.1670 times faster
List of PIL image input
Fast mean time: 0.002653493165969849 std: 5.281411681712124e-05
Slow mean time: 0.004650842666625976 std: 2.603507052302019e-05
1.7534 times faster
Torch tensor batch input
Fast mean time: 0.001086679458618164 std: 2.1535627834848004e-05
Slow mean time: 0.006056704044342041 std: 4.2741895905449867e-05
5.5755 times faster
With fused normalize and rescale:
List of PIL image input
Fast mean time: 0.006737879753112793 std: 0.0004794828832439867
Slow mean time: 0.009744790077209473 std: 0.0001803401154248809
1.4535 times faster
Torch tensor batch input
Fast mean time: 0.0017259061336517335 std: 0.0001853211607801645
Slow mean time: 0.050941661834716795 std: 0.0006355194041148941
29.7736 times faster
List of PIL image input
Fast mean time: 0.0025912017822265626 std: 4.1097075065458606e-05
Slow mean time: 0.005103119373321533 std: 2.3521387703663177e-05
1.9698 times faster
Torch tensor batch input
Fast mean time: 0.0009581971168518066 std: 2.3126728878460678e-05
Slow mean time: 0.006041062116622925 std: 3.5341232719903447e-05
6.3075 times faster
Speed ups:
List of PIL image input: 1.0210245911674707 times faster
Torch tensor batch input: 1.0280954140606922 times faster
List of PIL image input: 1.024039572745956 times faster
Torch tensor batch input: 1.134087589606292 times faster
self._maybe_update_transforms( | ||
do_resize=do_resize, | ||
do_rescale=do_rescale, | ||
do_normalize=do_normalize, | ||
size=size, | ||
resample=resample, | ||
rescale_factor=rescale_factor, | ||
image_mean=image_mean, | ||
image_std=image_std, | ||
image_type=image_type, | ||
) | ||
transformed_images = [self._transforms(image) for image in images] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks a bit tricky changing _transforms
, because stored _transforms
and class parameters became inconsistent. I don't see any side effects here, but probably more safe will be to return new transforms instead of updating
transforms = self._maybe_update_transforms(
do_resize=do_resize,
do_rescale=do_rescale,
do_normalize=do_normalize,
size=size,
resample=resample,
rescale_factor=rescale_factor,
image_mean=image_mean,
image_std=image_std,
image_type=image_type,
)
transformed_images = [transforms(image) for image in images]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because stored _transforms and class parameters became inconsistent
Which class params do you mean here - _transform_params
?
I've changed it so we return transforms
which we use. I've kept it as storing though - this is part of the trick that helps make this class fast when it's called multiple times: we don't have to recompose the transforms.
I might not be understanding the concern about class parameters and _transforms
becoming inconsistent. Let me know if there's a particularly risky code path I should try and protect against here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably I missed some design nuances, now I see that _transform_settings
are updated accordingly 🙂
The general idea was to use the method without saving the state in an instance. Can we use lru_cache for _build_transform
function without storing _transforms
and _transforms_settings
? or is there any reason to save them?
Here is how I see it, in case lru_cache
works fine - transform is not going to be recomposed for the same parameters. Does it make sense?
class BaseImageProcessorFast(BaseImageProcessor):
_transform_params = None
def _build_transforms(self, **kwargs) -> "Compose":
"""
Given the input settings e.g. do_resize, build the image transforms.
"""
raise NotImplementedError
def _validate_params(self, **kwargs) -> None:
for k, v in kwargs.items():
if k not in self._transform_params:
raise ValueError(f"Invalid transform parameter {k}={v}.")
@functools.lru_cache(maxsize=1)
def get_transforms(self, **kwargs) -> "Compose":
self._validate_params(**kwargs)
return self._build_transforms(**kwargs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Yes, absolutely we can do that!
My only thought is that have no transforms stored makes it harder to inspect if debugging, and to obtain you have to pass in all the do_resize
, size
etc. arguments, which might be annoying. As I'll probably be the person maintain for the foreseeable, I'll push this change and add back some explicit setting if it ends up being a pain :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very compelling argument! I just checked and there is no direct way to inspect the functools
cache, so that's might be a pain for debugging :)
0acf1ef
to
f524411
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments about numpy images
# Do we want to permute the channels here? | ||
transforms.append(NumpyToTensor()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect NumpyToTensor()
returns the same CHW format as PILToTensor()
, and some torchvision
transforms expect CHW format too, for example normalize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, in terms of what the transforms accept, I think we're best working in and assuming channels first. The reason I'm not sure about permuting is because of ToTensor
's behaviour, which when given a numpy array of 3 dimensions assumes it's in (H, W, C)
format i.e. should we make the same assumption?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it! I expect that numpy images come in HWC format because any library, as far as I know, that reads the image and returns numpy does so in HWC format. The same applies to third-party libraries for image processing/augmentation; if they work with numpy images, it is assumed to have the HWC format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated NumpyToTensor to transpose to CHW, assuming a HWC input: 6fb7df2
ed38c20
to
101535d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 Amazing work and results, can't wait for this to be democratized to other models
@@ -313,6 +322,10 @@ def from_pretrained(cls, pretrained_model_name_or_path, **kwargs): | |||
The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a | |||
git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any | |||
identifier allowed by git. | |||
use_fast (`bool`, *optional*, defaults to `False`): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add a warning for supported models that the user is using a slow version (+ default to None, if None set to False and warn for model in the list of supported models)
b426fff
to
4b6cad3
Compare
Thanks everyone for your extensive reviews! I think I've addressed everything and have updated the graph on the description to reflect the current benchmarks. @qubvel Would be great to have a final review before merging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 awesome!
I have just two questions/suggestions regarding testing slow vs fast:
- Do we claim that fast and slow image processor's results are somehow equivalent? Probably, they will not match exactly due to some difference in resizing algorithm, but we can test mean absolute error.
- Worth it to add a slow test to benchmark slow vs fast image processors, just to be sure that fast is really faster and we will not break anything in the future?
@@ -29,3 +29,4 @@ timm | |||
albumentations >= 1.4.5 | |||
torchmetrics | |||
pycocotools | |||
Pillow>=10.0.1,<=15.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, why 15.0 here? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a security thing :) It matches the pin we have in setup.py which was set in #27409
@qubvel Thanks for your review! Regarding your questions:
I don't think we can claim they'll be exactly the same, but they should be similar. I can add an equivalence test that has a certain tolerance.
Sure - good idea. I can add a test to make sure a single pass (or maybe series of passes as the fast benefits from the cache) are faster for the fast class. |
Co-authored-by: Arthur <[email protected]>
Co-authored-by: Arthur <[email protected]>
Co-authored-by: Arthur <[email protected]>
Co-authored-by: Arthur <[email protected]>
Co-authored-by: Pavel Iakubovskii <[email protected]>
71bf3b9
to
d598b5a
Compare
("mobilevit", ("MobileViTImageProcessor",)), | ||
("mobilevit", ("MobileViTImageProcessor",)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amyeroberts Is this typo or is there any specific reason for duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have found this when I try to resolve conflict from local branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just a typo - we can remove one of the lines. Thanks for catching!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it must have been added when mobilevit was first added, as it was present in the previous automap list. I've opened a PR here to remove it: #31383
* Draft fast image processors * Draft working fast version * py3.8 compatible cache * Enable loading fast image processors through auto * Tidy up; rescale behaviour based on input type * Enable tests for fast image processors * Smarter rescaling * Don't default to Fast * Safer imports * Add necessary Pillow requirement * Woops * Add AutoImageProcessor test * Fix up * Fix test for imagegpt * Fix test * Review comments * Add warning for TF and JAX input types * Rearrange * Return transforms * NumpyToTensor transformation * Rebase - include changes from upstream in ImageProcessingMixin * Safe typing * Fix up * convert mean/std to tesnor to rescale * Don't store transforms in state * Fix up * Update src/transformers/image_processing_utils_fast.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/auto/image_processing_auto.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/auto/image_processing_auto.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/auto/image_processing_auto.py Co-authored-by: Arthur <[email protected]> * Warn if fast image processor available * Update src/transformers/models/vit/image_processing_vit_fast.py * Transpose incoming numpy images to be in CHW format * Update mapping names based on packages, auto set fast to None * Fix up * Fix * Add AutoImageProcessor.from_pretrained(checkpoint, use_fast=True) test * Update src/transformers/models/vit/image_processing_vit_fast.py Co-authored-by: Pavel Iakubovskii <[email protected]> * Add equivalence and speed tests * Fix up --------- Co-authored-by: Arthur <[email protected]> Co-authored-by: Pavel Iakubovskii <[email protected]>
* Draft fast image processors * Draft working fast version * py3.8 compatible cache * Enable loading fast image processors through auto * Tidy up; rescale behaviour based on input type * Enable tests for fast image processors * Smarter rescaling * Don't default to Fast * Safer imports * Add necessary Pillow requirement * Woops * Add AutoImageProcessor test * Fix up * Fix test for imagegpt * Fix test * Review comments * Add warning for TF and JAX input types * Rearrange * Return transforms * NumpyToTensor transformation * Rebase - include changes from upstream in ImageProcessingMixin * Safe typing * Fix up * convert mean/std to tesnor to rescale * Don't store transforms in state * Fix up * Update src/transformers/image_processing_utils_fast.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/auto/image_processing_auto.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/auto/image_processing_auto.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/auto/image_processing_auto.py Co-authored-by: Arthur <[email protected]> * Warn if fast image processor available * Update src/transformers/models/vit/image_processing_vit_fast.py * Transpose incoming numpy images to be in CHW format * Update mapping names based on packages, auto set fast to None * Fix up * Fix * Add AutoImageProcessor.from_pretrained(checkpoint, use_fast=True) test * Update src/transformers/models/vit/image_processing_vit_fast.py Co-authored-by: Pavel Iakubovskii <[email protected]> * Add equivalence and speed tests * Fix up --------- Co-authored-by: Arthur <[email protected]> Co-authored-by: Pavel Iakubovskii <[email protected]>
* Draft fast image processors * Draft working fast version * py3.8 compatible cache * Enable loading fast image processors through auto * Tidy up; rescale behaviour based on input type * Enable tests for fast image processors * Smarter rescaling * Don't default to Fast * Safer imports * Add necessary Pillow requirement * Woops * Add AutoImageProcessor test * Fix up * Fix test for imagegpt * Fix test * Review comments * Add warning for TF and JAX input types * Rearrange * Return transforms * NumpyToTensor transformation * Rebase - include changes from upstream in ImageProcessingMixin * Safe typing * Fix up * convert mean/std to tesnor to rescale * Don't store transforms in state * Fix up * Update src/transformers/image_processing_utils_fast.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/auto/image_processing_auto.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/auto/image_processing_auto.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/auto/image_processing_auto.py Co-authored-by: Arthur <[email protected]> * Warn if fast image processor available * Update src/transformers/models/vit/image_processing_vit_fast.py * Transpose incoming numpy images to be in CHW format * Update mapping names based on packages, auto set fast to None * Fix up * Fix * Add AutoImageProcessor.from_pretrained(checkpoint, use_fast=True) test * Update src/transformers/models/vit/image_processing_vit_fast.py Co-authored-by: Pavel Iakubovskii <[email protected]> * Add equivalence and speed tests * Fix up --------- Co-authored-by: Arthur <[email protected]> Co-authored-by: Pavel Iakubovskii <[email protected]>
* Draft fast image processors * Draft working fast version * py3.8 compatible cache * Enable loading fast image processors through auto * Tidy up; rescale behaviour based on input type * Enable tests for fast image processors * Smarter rescaling * Don't default to Fast * Safer imports * Add necessary Pillow requirement * Woops * Add AutoImageProcessor test * Fix up * Fix test for imagegpt * Fix test * Review comments * Add warning for TF and JAX input types * Rearrange * Return transforms * NumpyToTensor transformation * Rebase - include changes from upstream in ImageProcessingMixin * Safe typing * Fix up * convert mean/std to tesnor to rescale * Don't store transforms in state * Fix up * Update src/transformers/image_processing_utils_fast.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/auto/image_processing_auto.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/auto/image_processing_auto.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/auto/image_processing_auto.py Co-authored-by: Arthur <[email protected]> * Warn if fast image processor available * Update src/transformers/models/vit/image_processing_vit_fast.py * Transpose incoming numpy images to be in CHW format * Update mapping names based on packages, auto set fast to None * Fix up * Fix * Add AutoImageProcessor.from_pretrained(checkpoint, use_fast=True) test * Update src/transformers/models/vit/image_processing_vit_fast.py Co-authored-by: Pavel Iakubovskii <[email protected]> * Add equivalence and speed tests * Fix up --------- Co-authored-by: Arthur <[email protected]> Co-authored-by: Pavel Iakubovskii <[email protected]>
* Draft fast image processors * Draft working fast version * py3.8 compatible cache * Enable loading fast image processors through auto * Tidy up; rescale behaviour based on input type * Enable tests for fast image processors * Smarter rescaling * Don't default to Fast * Safer imports * Add necessary Pillow requirement * Woops * Add AutoImageProcessor test * Fix up * Fix test for imagegpt * Fix test * Review comments * Add warning for TF and JAX input types * Rearrange * Return transforms * NumpyToTensor transformation * Rebase - include changes from upstream in ImageProcessingMixin * Safe typing * Fix up * convert mean/std to tesnor to rescale * Don't store transforms in state * Fix up * Update src/transformers/image_processing_utils_fast.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/auto/image_processing_auto.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/auto/image_processing_auto.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/auto/image_processing_auto.py Co-authored-by: Arthur <[email protected]> * Warn if fast image processor available * Update src/transformers/models/vit/image_processing_vit_fast.py * Transpose incoming numpy images to be in CHW format * Update mapping names based on packages, auto set fast to None * Fix up * Fix * Add AutoImageProcessor.from_pretrained(checkpoint, use_fast=True) test * Update src/transformers/models/vit/image_processing_vit_fast.py Co-authored-by: Pavel Iakubovskii <[email protected]> * Add equivalence and speed tests * Fix up --------- Co-authored-by: Arthur <[email protected]> Co-authored-by: Pavel Iakubovskii <[email protected]>
* Draft fast image processors * Draft working fast version * py3.8 compatible cache * Enable loading fast image processors through auto * Tidy up; rescale behaviour based on input type * Enable tests for fast image processors * Smarter rescaling * Don't default to Fast * Safer imports * Add necessary Pillow requirement * Woops * Add AutoImageProcessor test * Fix up * Fix test for imagegpt * Fix test * Review comments * Add warning for TF and JAX input types * Rearrange * Return transforms * NumpyToTensor transformation * Rebase - include changes from upstream in ImageProcessingMixin * Safe typing * Fix up * convert mean/std to tesnor to rescale * Don't store transforms in state * Fix up * Update src/transformers/image_processing_utils_fast.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/auto/image_processing_auto.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/auto/image_processing_auto.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/auto/image_processing_auto.py Co-authored-by: Arthur <[email protected]> * Warn if fast image processor available * Update src/transformers/models/vit/image_processing_vit_fast.py * Transpose incoming numpy images to be in CHW format * Update mapping names based on packages, auto set fast to None * Fix up * Fix * Add AutoImageProcessor.from_pretrained(checkpoint, use_fast=True) test * Update src/transformers/models/vit/image_processing_vit_fast.py Co-authored-by: Pavel Iakubovskii <[email protected]> * Add equivalence and speed tests * Fix up --------- Co-authored-by: Arthur <[email protected]> Co-authored-by: Pavel Iakubovskii <[email protected]>
What does this PR do?
Adds a ViTImageProcessorFast class, which uses a torchvision backend to speed up image processor
Benchmark comparing the two image processors:
Script for replicating benchmark