Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring of ImageProcessorFast #35069

Open
wants to merge 36 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
46fd0dd
add init and base image processing functions
yonigozlan Dec 3, 2024
e0ac6ad
add add_fast_image_processor to transformers-cli
yonigozlan Dec 3, 2024
0edb3f4
add working fast image processor clip
yonigozlan Dec 3, 2024
5f0884b
add fast image processor to doc, working tests
yonigozlan Dec 4, 2024
2f07d0e
remove "to be implemented" SigLip
yonigozlan Dec 4, 2024
451b66c
fix unprotected import
yonigozlan Dec 4, 2024
7c5b9ec
fix unprotected vision import
yonigozlan Dec 4, 2024
5b1d116
update ViTImageProcessorFast
yonigozlan Dec 4, 2024
d289428
increase threshold slow fast ewuivalence
yonigozlan Dec 4, 2024
90d2831
add fast img blip
yonigozlan Dec 4, 2024
4b52b1b
add fast class in tests with cli
yonigozlan Dec 4, 2024
14a6491
improve cli
yonigozlan Dec 5, 2024
3ad51b6
add fast image processor convnext
yonigozlan Dec 6, 2024
19cbc3c
add LlavaPatchingMixin and fast image processor for llava_next and ll…
yonigozlan Dec 7, 2024
5c7d389
add device kwarg to ImagesKwargs for fast processing on cuda
yonigozlan Dec 9, 2024
2cdb61c
cleanup
yonigozlan Dec 9, 2024
5b45eab
fix unprotected import
yonigozlan Dec 9, 2024
1214f8f
group images by sizes and add batch processing
yonigozlan Dec 11, 2024
87afdab
Add batch equivalence tests, skip when center_crop is used
yonigozlan Dec 11, 2024
5abe04e
cleanup
yonigozlan Dec 11, 2024
2e9fc31
update init and cli
yonigozlan Dec 11, 2024
baa9ba5
fix-copies
yonigozlan Dec 11, 2024
ccbd31b
refactor convnext, cleanup base
yonigozlan Dec 16, 2024
da6de2e
fix
yonigozlan Dec 16, 2024
3a2b903
remove patching mixins, add piped torchvision transforms for ViT
yonigozlan Dec 17, 2024
dd01a79
fix unbatched processing
yonigozlan Dec 17, 2024
63c139c
fix f strings
yonigozlan Dec 17, 2024
b923625
protect imports
yonigozlan Dec 17, 2024
ff9f04c
change llava onevision to class transforms (test)
yonigozlan Dec 18, 2024
6196c46
fix convnext
yonigozlan Dec 18, 2024
9fd8f14
Merge branch 'main' into improve-fast-image-processor-base
yonigozlan Dec 18, 2024
6aae318
Merge remote-tracking branch 'upstream/main' into improve-fast-image-…
yonigozlan Jan 6, 2025
209de9c
improve formatting (following Pavel review)
yonigozlan Jan 6, 2025
3b4ba0a
fix handling device arg
yonigozlan Jan 6, 2025
c506d5b
improve cli
yonigozlan Jan 6, 2025
d9e7e77
fix
yonigozlan Jan 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/source/en/model_doc/blip.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,11 @@ The original code can be found [here](https://github.com/salesforce/BLIP).
[[autodoc]] BlipImageProcessor
- preprocess

## BlipImageProcessorFast

[[autodoc]] BlipImageProcessorFast
- preprocess

<frameworkcontent>
<pt>

Expand Down
5 changes: 5 additions & 0 deletions docs/source/en/model_doc/clip.md
Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,11 @@ The resource should ideally demonstrate something new instead of duplicating an
[[autodoc]] CLIPImageProcessor
- preprocess

## CLIPImageProcessorFast

[[autodoc]] CLIPImageProcessorFast
- preprocess

## CLIPFeatureExtractor

[[autodoc]] CLIPFeatureExtractor
Expand Down
5 changes: 5 additions & 0 deletions docs/source/en/model_doc/convnext.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,11 @@ If you're interested in submitting a resource to be included here, please feel f
[[autodoc]] ConvNextImageProcessor
- preprocess

## ConvNextImageProcessorFast

[[autodoc]] ConvNextImageProcessorFast
- preprocess

<frameworkcontent>
<pt>

Expand Down
5 changes: 5 additions & 0 deletions docs/source/en/model_doc/deit.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,11 @@ If you're interested in submitting a resource to be included here, please feel f
[[autodoc]] DeiTImageProcessor
- preprocess

## DeiTImageProcessorFast

[[autodoc]] DeiTImageProcessorFast
- preprocess

<frameworkcontent>
<pt>

Expand Down
5 changes: 5 additions & 0 deletions docs/source/en/model_doc/llava_next.md
Original file line number Diff line number Diff line change
Expand Up @@ -288,6 +288,11 @@ model = AutoModelForImageTextToText.from_pretrained(
[[autodoc]] LlavaNextImageProcessor
- preprocess

## LlavaNextImageProcessorFast

[[autodoc]] LlavaNextImageProcessorFast
- preprocess

## LlavaNextProcessor

[[autodoc]] LlavaNextProcessor
Expand Down
13 changes: 9 additions & 4 deletions docs/source/en/model_doc/llava_onevision.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,8 +100,8 @@ import torch
from PIL import Image
import requests

processor = AutoProcessor.from_pretrained("llava-hf/llava-onevision-qwen2-7b-ov-hf")
model = LlavaOnevisionForConditionalGeneration.from_pretrained("llava-hf/llava-onevision-qwen2-7b-ov-hf", torch_dtype=torch.float16, low_cpu_mem_usage=True)
processor = AutoProcessor.from_pretrained("llava-hf/llava-onevision-qwen2-7b-ov-hf")
model = LlavaOnevisionForConditionalGeneration.from_pretrained("llava-hf/llava-onevision-qwen2-7b-ov-hf", torch_dtype=torch.float16, low_cpu_mem_usage=True)
model.to("cuda:0")

# prepare image and text prompt, using the appropriate prompt template
Expand Down Expand Up @@ -298,8 +298,8 @@ First make sure to install flash-attn. Refer to the [original repository of Flas
from transformers import LlavaOnevisionForConditionalGeneration

model = LlavaOnevisionForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.float16,
model_id,
torch_dtype=torch.float16,
low_cpu_mem_usage=True,
use_flash_attention_2=True
).to(0)
Expand All @@ -318,6 +318,11 @@ model = LlavaOnevisionForConditionalGeneration.from_pretrained(

[[autodoc]] LlavaOnevisionImageProcessor

## LlavaOnevisionImageProcessorFast

[[autodoc]] LlavaOnevisionImageProcessorFast
- preprocess

## LlavaOnevisionVideoProcessor

[[autodoc]] LlavaOnevisionVideoProcessor
Expand Down
5 changes: 5 additions & 0 deletions docs/source/en/model_doc/siglip.md
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,11 @@ Below is an expected speedup diagram that compares inference time between the na
[[autodoc]] SiglipImageProcessor
- preprocess

## SiglipImageProcessorFast

[[autodoc]] SiglipImageProcessorFast
- preprocess

## SiglipProcessor

[[autodoc]] SiglipProcessor
Expand Down
5 changes: 5 additions & 0 deletions docs/source/ja/model_doc/blip.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,11 @@ BLIP は、次のようなさまざまなマルチモーダル タスクを実
[[autodoc]] BlipImageProcessor
- preprocess

## BlipImageProcessorFast

[[autodoc]] BlipImageProcessorFast
- preprocess

<frameworkcontent>
<pt>

Expand Down
5 changes: 5 additions & 0 deletions docs/source/ja/model_doc/clip.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,11 @@ CLIP を使い始めるのに役立つ公式 Hugging Face およびコミュニ
[[autodoc]] CLIPImageProcessor
- preprocess

## CLIPImageProcessorFast

[[autodoc]] CLIPImageProcessorFast
- preprocess

## CLIPFeatureExtractor

[[autodoc]] CLIPFeatureExtractor
Expand Down
5 changes: 5 additions & 0 deletions docs/source/ja/model_doc/convnext.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,11 @@ ConvNeXT の使用を開始するのに役立つ公式 Hugging Face およびコ
[[autodoc]] ConvNextImageProcessor
- preprocess

## ConvNextImageProcessorFast

[[autodoc]] ConvNextImageProcessorFast
- preprocess

<frameworkcontent>
<pt>

Expand Down
5 changes: 5 additions & 0 deletions docs/source/ja/model_doc/deit.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,11 @@ DeiT を始めるのに役立つ公式 Hugging Face およびコミュニティ
[[autodoc]] DeiTImageProcessor
- preprocess

## DeiTImageProcessorFast

[[autodoc]] DeiTImageProcessorFast
- preprocess

<frameworkcontent>
<pt>

Expand Down
14 changes: 14 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -1278,10 +1278,17 @@
]
else:
_import_structure["image_processing_utils_fast"] = ["BaseImageProcessorFast"]
_import_structure["models.blip"].append("BlipImageProcessorFast")
_import_structure["models.clip"].append("CLIPImageProcessorFast")
_import_structure["models.convnext"].append("ConvNextImageProcessorFast")
_import_structure["models.deformable_detr"].append("DeformableDetrImageProcessorFast")
_import_structure["models.deit"].append("DeiTImageProcessorFast")
_import_structure["models.detr"].append("DetrImageProcessorFast")
_import_structure["models.llava_next"].append("LlavaNextImageProcessorFast")
_import_structure["models.llava_onevision"].append("LlavaOnevisionImageProcessorFast")
_import_structure["models.pixtral"].append("PixtralImageProcessorFast")
_import_structure["models.rt_detr"].append("RTDetrImageProcessorFast")
_import_structure["models.siglip"].append("SiglipImageProcessorFast")
_import_structure["models.vit"].append("ViTImageProcessorFast")

try:
Expand Down Expand Up @@ -6298,10 +6305,17 @@
from .utils.dummy_torchvision_objects import *
else:
from .image_processing_utils_fast import BaseImageProcessorFast
from .models.blip import BlipImageProcessorFast
from .models.clip import CLIPImageProcessorFast
from .models.convnext import ConvNextImageProcessorFast
from .models.deformable_detr import DeformableDetrImageProcessorFast
from .models.deit import DeiTImageProcessorFast
from .models.detr import DetrImageProcessorFast
from .models.llava_next import LlavaNextImageProcessorFast
from .models.llava_onevision import LlavaOnevisionImageProcessorFast
from .models.pixtral import PixtralImageProcessorFast
from .models.rt_detr import RTDetrImageProcessorFast
from .models.siglip import SiglipImageProcessorFast
from .models.vit import ViTImageProcessorFast

try:
Expand Down
Loading