Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added segmentation maps support for DPT image processor #34345

Open
wants to merge 424 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
424 commits
Select commit Hold shift + click to select a range
469eddb
Fix `check_training_gradient_checkpointing` (#34806)
ydshieh Nov 19, 2024
befbbf2
Added image-text-to-text pipeline to task guide (#34783)
merveenoyan Nov 19, 2024
3033509
Translate attention.md into Chinese (#34716)
wwwbai Nov 19, 2024
145fbd4
LLaVA OV: fix unpadding precision (#34779)
zucchini-nlp Nov 20, 2024
9470d65
Fix low memory beam search (#34746)
zucchini-nlp Nov 20, 2024
9d16441
Fix the memory usage issue of logits in generate() (#34813)
kjohew Nov 20, 2024
8cadf76
fix(DPT,Depth-Anything) `torch.export` (#34103)
philkuz Nov 20, 2024
f297af5
Fix: take into account meta device (#34134)
tibor-reiss Nov 20, 2024
67890de
Torchao weights only + prequantized compability (#34355)
SunMarc Nov 20, 2024
bf42c3b
Fix hyperparameter search when optuna+deepseed (#34642)
corentin-ryr Nov 20, 2024
3cb8676
Fix CI by tweaking torchao tests (#34832)
SunMarc Nov 20, 2024
40821a2
Fix CI slack reporting issue (#34833)
ydshieh Nov 20, 2024
28fb02f
VLMs: enable generation tests - last batch (#34484)
zucchini-nlp Nov 21, 2024
d4e1acb
Change logging level from warning to info for `max_steps` overriding …
qgallouedec Nov 21, 2024
c57eafd
Add Nemotron GGUF Loading Support (#34725)
farrosalferro Nov 21, 2024
ae5cbf8
Improve gguf tensor processing (#34515)
VladOS95-cyber Nov 21, 2024
d6a5c23
Fix ds nvme (#34444)
eljandoubi Nov 21, 2024
1887159
Fix heuristic scheduling for UAG (#34805)
jmamou Nov 21, 2024
4e90b99
Refactor StarCoder2 using modular (#34015)
Cyrilvallez Nov 21, 2024
6a912ff
Watermarking: fix order (#34849)
zucchini-nlp Nov 22, 2024
1867be6
Update checks for torch.distributed.tensor to require torch >= 2.5 (#…
loadams Nov 22, 2024
d9e6f30
Remove quantization related config from dequantized model (#34856)
konradkalita Nov 22, 2024
597efd2
Auto compile when static cache (#34247)
ArthurZucker Nov 22, 2024
42b36d7
Speculative decoding: Test the target distribution (to prevent issues…
keyboardAnt Nov 22, 2024
861758e
smol improvements to support more flexible usage (#34857)
andimarafioti Nov 22, 2024
286ffaa
[CI] Skip EETQ tests while package is broken with latest transformers…
BenjaminBossan Nov 22, 2024
54be2d7
Bitnet test fix to avoid using gated model (#34863)
MekkCyber Nov 22, 2024
3a8eb74
Fix support for image processors modifications in modular (#34866)
yonigozlan Nov 22, 2024
318fe25
Fix: Enable prefill phase key value caching of nemotron/minitron mode…
jeongin601 Nov 25, 2024
1339a14
Add safe_globals to resume training on PyTorch 2.6 (#34632)
dvrogozh Nov 25, 2024
c1a8520
Cache: init empty cache when `use_cache` (#34274)
zucchini-nlp Nov 25, 2024
098962d
BLIP: fix generation after hub update (#34876)
zucchini-nlp Nov 25, 2024
857d46c
[`Deberta/Deberta-v2`] Refactor code base to support compile, export,…
ArthurZucker Nov 25, 2024
1e492af
🔴 Mllama: fix base prefix (#34874)
zucchini-nlp Nov 25, 2024
4dc1a69
Sum gathered input tokens (#34554)
techkang Nov 25, 2024
a0f4f31
allow unused input parameters passthrough when chunking in asr pipeli…
VictorAtIfInsurance Nov 25, 2024
c50b567
prepare_fa2_from_position_ids function bugfix (#33269)
meliksahturker Nov 25, 2024
62ab94d
Bump tornado from 6.4.1 to 6.4.2 in /examples/research_projects/visua…
dependabot[bot] Nov 25, 2024
97514a8
chore: fix some typos (#34891)
wanxiangchwng Nov 25, 2024
74db22f
Fix convert_tokens_to_string when decoder is None (#34569)
dszeto Nov 25, 2024
11cc229
[`peft`] Given that `self.active_adapter` is deprecated, avoid using …
tomaarsen Nov 25, 2024
f4c04ba
Fix Qwen2 failing tests (#34819)
jla524 Nov 25, 2024
1de3598
Bump tornado from 6.4.1 to 6.4.2 in /examples/research_projects/lxmer…
dependabot[bot] Nov 25, 2024
9121ab8
Rename OLMo November to OLMo2 (#34864)
2015aroras Nov 25, 2024
4e6b19c
Fix : BitNet tests (#34895)
MekkCyber Nov 25, 2024
b13916c
[AWQ, CI] Bump AWQ version used in docker image (#34922)
BenjaminBossan Nov 25, 2024
a464afb
fix static cache data type miss-match (#34799)
jiqing-feng Nov 25, 2024
a830df2
Fix `test_auto_backbone_timm_model_from_pretrained` (#34877)
ydshieh Nov 25, 2024
b76a292
Upgrade torch version to 2.5 in dockerfile for quantization CI (#34924)
MekkCyber Nov 25, 2024
890ea7d
Fix failling GGML test (#34871)
MekkCyber Nov 25, 2024
95c10fe
Updated documentation and added conversion utility (#34319)
ViktorooReps Nov 25, 2024
bfc3556
making gpt2 fx traceable (#34633)
xuzifei-dmatrix Nov 25, 2024
bdb29ff
Fix import structure for Fast Image processors (#34859)
yonigozlan Nov 25, 2024
73b4ab1
VideoLLaVA: add default values (#34916)
zucchini-nlp Nov 26, 2024
0e805e6
Skipping aqlm non working inference tests till fix merged (#34865)
MekkCyber Nov 26, 2024
4d1d0f2
[Whisper] Fix whisper integration tests (#34111)
eustlb Nov 26, 2024
1141eff
Add Pytorch Tensor Parallel support for Mistral (#34927)
VladOS95-cyber Nov 26, 2024
5a45617
change apply_rotary_pos_emb of Glmmodel for GLM-Edge Series model (#3…
zRzRzRzRzRzRzR Nov 26, 2024
d5cf91b
Separate chat templates into a single file (#33957)
Rocketknight1 Nov 26, 2024
1f6b423
Fix torch.onnx.export of Qwen2-VL vision encoder (#34852)
xenova Nov 26, 2024
a0ba631
Update the Python version in the Chinese README to match the English …
vansin Nov 26, 2024
64b73e6
[i18n-ar] Translated file : `docs/source/ar/benchmarks.md` into Arabi…
AhmedAlmaghz Nov 26, 2024
6bc0c21
[docs] use device-agnostic API instead of cuda (#34913)
faaany Nov 26, 2024
784d220
[doc] use full path for run_qa.py (#34914)
faaany Nov 26, 2024
5bfb40b
docs: HUGGINGFACE_HUB_CACHE -> HF_HUB_CACHE (#34904)
imba-tjd Nov 26, 2024
6c3f168
[i18n-zh]Translated tiktoken.md into chinese (#34936)
blueingman Nov 26, 2024
4c1388f
[`FlexAttention`] Update gemma2 (#34942)
ArthurZucker Nov 27, 2024
8f48ccf
Fix : Add PEFT from source to CI docker (#34969)
MekkCyber Nov 27, 2024
0d99a93
Avoid calling `get_max_length` (#34971)
ydshieh Nov 27, 2024
5f8b24e
Fix flaky test execution caused by `Thread` (#34966)
ydshieh Nov 27, 2024
0600f46
🌐 [i18n-KO] Translated encoder-decoder.md to Korean (#34880)
maximizemaxwell Nov 27, 2024
6372255
[docs] add explanation to `release_memory()` (#34911)
faaany Nov 27, 2024
2910015
[i18n-zh]Translated perf_train_special.md into Chinese (#34948)
blueingman Nov 27, 2024
4120cb2
Fix typo in code block in vipllava.md (#34957)
yuanx749 Nov 27, 2024
5523e38
Fixed typo in `VisitWebpageTool` (#34978)
sergiopaniego Nov 27, 2024
f4b674f
[PEFT] Set eval mode when loading PEFT adapter (#34509)
BenjaminBossan Nov 28, 2024
4f0bf98
Fix `save_pretrained` for partially offloaded models (#34890)
kylesayrs Nov 28, 2024
2b053fd
🚨🚨🚨 Changed DINOv2Config default patch size to 14 (#34568)
OFSkean Nov 28, 2024
44af935
Refine the code of Universal Assisted Generation (#34823)
xinpengzz Nov 28, 2024
57ca9e6
Allow compressed-tensors quantized model to be trained (#34520)
horheynm Nov 28, 2024
5e8c1d7
Offloaded cache: fix generate (#34921)
zucchini-nlp Nov 28, 2024
6300212
Fix `utils/check_bad_commit.py` (for auto ping in CI) (#34943)
ydshieh Nov 28, 2024
9d6f0dd
Add optimized `PixtralImageProcessorFast` (#34836)
mgoin Nov 28, 2024
01ad80f
Improve `.from_pretrained` type annotations (#34973)
qubvel Nov 28, 2024
f491096
Fix docker CI : install autogptq from source (#35000)
MekkCyber Nov 28, 2024
0b5b5e6
Let server decide default repo visibility (#34999)
Wauplin Nov 28, 2024
89d7bf5
🚨🚨🚨 Uniformize kwargs for TrOCR Processor (#34587)
tibor-reiss Nov 29, 2024
737f4dc
Update timm version (#35005)
qubvel Nov 29, 2024
f7427f5
fix: double verbs (#35008)
SamuelLarkin Nov 29, 2024
19dabe9
Update `FillMaskPipeline.__call__` signature and docstring (#35006)
alvarobartt Nov 29, 2024
3480cbb
Only cast `cu_seqlens` when tracing (#35016)
xenova Dec 2, 2024
9ab8c5b
fix variable undefined bug when return_tensors is not specified in ll…
chenweize1998 Dec 2, 2024
c24c79e
Optimize memory usage of mllama encoder (#34930)
milesial Dec 2, 2024
7b5f76e
Typo in warning switching to optimum-quanto (#35028)
Bojun-Feng Dec 2, 2024
f41d5d8
Add type hints for forward functions in Gemma2 (#35034)
jla524 Dec 2, 2024
3183047
Fix `test_eager_matches_sdpa_inference` for `XPU` backend (#34889)
dvrogozh Dec 2, 2024
3129967
Multiple typo fixes in Tutorials docs (#35035)
henryhmko Dec 2, 2024
f0dec87
add docstring example for compute_loss_func (#35020)
secrettoad Dec 2, 2024
4955e4e
[i18n-ar] Translated file : `docs/source/ar/notebooks.md` into Arabic…
AhmedAlmaghz Dec 2, 2024
527dc04
[docs] add the missing import for Image and bug fix (#34776)
faaany Dec 2, 2024
f9c7e60
Translate bertlogy.md into Chinese (#34908)
wwwbai Dec 2, 2024
ee37bf0
Automatic compilation in generate: do not rely on inner function (#34…
Cyrilvallez Dec 3, 2024
901f504
Add token cost + runtime monitoring to Agent and HfEngine children (#…
aymeric-roucher Dec 3, 2024
7a7f276
Fix `BertGeneration` (#35043)
ydshieh Dec 3, 2024
125de41
fix speecht5 failure issue in test_peft_gradient_checkpointing_enable…
sywangyi Dec 3, 2024
3deaa81
[docs] fix example code bug (#35054)
faaany Dec 3, 2024
346597b
Translate community.md into Chinese (#35013)
wwwbai Dec 3, 2024
b8cdc26
[docs] use device-agnostic instead of `cuda` (#35047)
faaany Dec 3, 2024
329f5db
[docs] use device-agnostic API instead of hard-coded cuda (#35048)
faaany Dec 3, 2024
c7a109e
Fix `pad_token_tensor` is None in warning (#34005)
tshu-w Dec 4, 2024
accb720
Add Pytorch Tensor Parallel support for Qwen2, Qwen2Moe, Starcoder2 (…
VladOS95-cyber Dec 4, 2024
46df859
[`GPTNeoX`] Flex Attention + Refactor (#34896)
vasqu Dec 4, 2024
1da1e0d
Support for easier multimodal use of modular (#35056)
Cyrilvallez Dec 4, 2024
baa3b22
[docs] add a comment that offloading requires CUDA GPU (#35055)
faaany Dec 4, 2024
1ed1de2
[docs] Increase visibility of torch_dtype="auto" (#35067)
stevhliu Dec 4, 2024
beb2c66
Informative (#35059)
ydshieh Dec 5, 2024
54aae12
[Whisper] Fix whisper tokenizer (#34537)
eustlb Dec 5, 2024
93f87d3
[`tokenizers`] bump to 0.21 (#34972)
ArthurZucker Dec 5, 2024
3544705
Update Mistral conversion script (#34829)
Cyrilvallez Dec 5, 2024
482cb28
Fix `tie_word_embeddings` handling for GGUF models (#35085)
Isotr0py Dec 5, 2024
95a855e
Deprecate quanto and switch to optimum-quanto (#35001)
MekkCyber Dec 5, 2024
50189e3
Add I-JEPA (#33125)
jmtzt Dec 5, 2024
e682c17
BLIP: this is correct now (#35081)
zucchini-nlp Dec 5, 2024
a928d9c
[`trainer`] fix the GA `model_accepts_loss_kwargs` (#34915)
ArthurZucker Dec 5, 2024
b0a51e5
Fix flaky Hub CI (`test_trainer.py`) (#35062)
ydshieh Dec 5, 2024
e27465c
Adaptive dynamic number of speculative tokens (#34156)
jmamou Dec 5, 2024
a5bb528
Fix signatures for processing kwargs (#35105)
molbap Dec 5, 2024
66ab300
Dev version
LysandreJik Dec 5, 2024
44f88d8
[docs] Update Python version in translations (#35096)
jla524 Dec 5, 2024
98e8062
[docs] top_p, top_k, temperature docstrings (#35065)
stevhliu Dec 5, 2024
15ab310
Fix private forked repo. CI (#35114)
ydshieh Dec 6, 2024
9ad4c93
Add Aria (#34157)
aymeric-roucher Dec 6, 2024
7f95372
Add feature dim attributes to BitLinear for easier PEFT integration (…
agostinv Dec 6, 2024
c8c8dff
Update I-JEPA checkpoints path (#35120)
qubvel Dec 6, 2024
1ccca8f
Fix GA loss bugs and add unit test (#35121)
techkang Dec 9, 2024
9e420e0
[I-JEPA] Update docs (#35148)
NielsRogge Dec 9, 2024
1452dc2
Corrected typo in agent system prompts (#35143)
Uvi-12 Dec 9, 2024
de8a0b7
Option to set 'non_blocking' for to(device) in BatchEncoding and Batc…
daniel-bogdoll Dec 9, 2024
7238387
Fix typo in EETQ Tests (#35160)
MekkCyber Dec 9, 2024
8e806a3
Cleanup: continue the init refactor (#35167)
LysandreJik Dec 9, 2024
4bc39de
Super tiny fix logging message (#35132)
fzyzcjy Dec 9, 2024
fa8763c
Fixed typo of 'avilable' in prompts.py (#35145)
Uvi-12 Dec 9, 2024
34f4080
[CI] Fix bnb quantization tests with accelerate>=1.2.0 (#35172)
matthewdouglas Dec 9, 2024
dada0fd
Fix `num_items_in_batch` not being an integer (#35115)
xspirus Dec 10, 2024
0938b57
Assisted decoding multi-gpu (#35116)
zucchini-nlp Dec 10, 2024
80f2b16
Fix file path for shard_num 1 with mllama converter (#35053)
strangiato Dec 10, 2024
6acb4e4
Support BatchNorm in Hubert pos_conv_emb as in fairseq (#34389)
gallilmaimon Dec 10, 2024
5fba3f9
Remove unnecessary masked_fill in deberta models (#35182)
xadupre Dec 10, 2024
3e2769a
Fix DBRX LayerNorm init method (#35177)
hgt312 Dec 10, 2024
e5c45a6
Fixing GGUF support for StableLm (#35060)
MekkCyber Dec 10, 2024
425af6c
[i18n-ar] Translated file : `docs/source/ar/community.md` into Arabic…
AhmedAlmaghz Dec 10, 2024
52d1354
Multiple typo fixes in NLP, Audio docs (#35181)
henryhmko Dec 10, 2024
217c47e
Only import torch.distributed if it is available (#35133)
GaetanLepage Dec 10, 2024
91b8ab1
[i18n-<languageCode>] Translating Benchmarks.md to Chinese (#35137)
asdkfjsd Dec 10, 2024
5290f6a
[docs] Fix FlashAttention link (#35171)
stevhliu Dec 10, 2024
e850892
Update data collator docstrings to accurately reference Nvidia tensor…
johngrahamreynolds Dec 10, 2024
10feacd
[i18n-<languageCode>] Translating agents.md to Chinese (#35139)
HMJ0628 Dec 10, 2024
9094b87
BLIP: enable device map (#34850)
zucchini-nlp Dec 11, 2024
d363e71
🧹 Remove deprecated RotaryEmbedding parts in the Attention layers (#3…
Cyrilvallez Dec 11, 2024
bcc50cc
[PEFT] Better Trainer error when prompt learning with loading best mo…
BenjaminBossan Dec 11, 2024
5fcf628
Add TimmWrapper (#34564)
qubvel Dec 11, 2024
7d303ef
Cleanup: continue the init refactor (#35170)
LysandreJik Dec 11, 2024
33c12e4
Fix CI (#35208)
Cyrilvallez Dec 11, 2024
6181c6b
Fix seamless TTS generate (#34968)
ylacombe Dec 11, 2024
a9ccdfd
docs: clarify initializer_range parameter description in Idefics3Visi…
h3110Fr13nd Dec 11, 2024
3db8e27
Fixed typo of 'indentifier' in audio_utils.py (#35226)
Uvi-12 Dec 12, 2024
5cf11e5
Fix type hints for apply_chat_template (#35216)
Rocketknight1 Dec 12, 2024
63766ab
Support Python 3.10+ Union style in chat template type hints parsing …
RezaRahemtola Dec 12, 2024
e3ee49f
Refactoring `AssistedCandidateGenerator` for Improved Modularity and …
keyboardAnt Dec 12, 2024
a691ccb
Change back to `Thread` for SF conversion (#35236)
ydshieh Dec 12, 2024
11ba1d4
[Init refactor] Modular changes (#35240)
LysandreJik Dec 12, 2024
31f9a28
Fix typo in chat template example (#35250)
EricWinsorDSIT Dec 13, 2024
e4e404f
Run model as compressed/uncompressed mode (#34719)
horheynm Dec 13, 2024
64478c7
Add Cohere2 model (#35224)
alexrs-cohere Dec 13, 2024
3d213b5
skip Fuyu from test_generate (#35246)
nhamanasu Dec 13, 2024
bdd4201
[tests] fix "Tester object has no attribute '_testMethodName'" (#34910)
faaany Dec 13, 2024
8096161
Use `rsfE` with `pytest` (#35119)
ydshieh Dec 13, 2024
bc6ae0d
Update AMD docker image (rocm 6.1) (#35259)
ivarflakstad Dec 13, 2024
e94083b
Fixed typos in Audio Classification Documentation (#35263)
Uvi-12 Dec 13, 2024
6009642
Translating agents_advanced.md to Chinese (#35231)
HMJ0628 Dec 13, 2024
7237b3e
Fix FSDP no longer working (#35212)
muellerzr Dec 13, 2024
add53e2
don't use no_sync when deepspeed doesn't support it for certain zero …
winglian Dec 13, 2024
ca03842
[i18n-Chinese] Translating perf_train_cpu.md to Chinese (#35242)
asdkfjsd Dec 13, 2024
5615a39
Fall back to slow image processor in ImageProcessingAuto when no fast…
yonigozlan Dec 15, 2024
66531a1
Aggeregate test summary files in CircleCI workflow runs (#34989)
ydshieh Dec 16, 2024
1491028
Blip: fix offloading and MP tests (#35239)
zucchini-nlp Dec 16, 2024
85eb339
Fix : model used to test ggml conversion of Falcon-7b is incorrect (#…
MekkCyber Dec 16, 2024
d0f3221
Temporarily disable amd push ci (#35293)
ivarflakstad Dec 16, 2024
d5b81e1
Delete redundancy for loop checks. (#35288)
zhanluxianshen Dec 16, 2024
9feae5f
[Whisper] patch float type on mps (#35295)
eustlb Dec 16, 2024
22834ee
Fix typos in Translated Audio Classification Docs (#35287)
jla524 Dec 16, 2024
886f690
Translating "translate perf_infer_gpu_multi.md" to Chinese (#35271)
HMJ0628 Dec 16, 2024
eb92bc4
Fix wrongs in quicktour[zh] (#35272)
zhanluxianshen Dec 16, 2024
f5620a7
Improved documentation of Automatic speech recognition (#35268)
Uvi-12 Dec 16, 2024
a7f5479
fix modular order (#35297)
ArthurZucker Dec 17, 2024
f33a0ce
Add ColPali to 🤗 transformers (#33736)
tonywu71 Dec 17, 2024
6c08b3b
Add Falcon3 documentation (#35307)
mokeddembillel Dec 17, 2024
747f361
Add sdpa for Beit (#34941)
OmarManzoor Dec 17, 2024
6eb00dd
Support for SDPA for SAM models (#34110)
MagnusS0 Dec 17, 2024
e0ae9b5
🚨🚨🚨 Delete conversion scripts when making release wheels (#35296)
Rocketknight1 Dec 17, 2024
d29a06e
remove `benchmark` job in `push-important-models.yml` (#35292)
ydshieh Dec 17, 2024
deac971
🚨🚨🚨 Limit backtracking in Nougat regexp (#35264)
qubvel Dec 17, 2024
4302b27
Fix typos in translated quicktour docs (#35302)
jla524 Dec 17, 2024
927c3e3
Fix image preview in multi-GPU inference docs (#35303)
jla524 Dec 17, 2024
a7feae1
Fix remove unused parameter in docs (#35306)
zzzzzsa Dec 17, 2024
8bfd7ee
Add Cohere2 docs details (#35294)
alexrs-cohere Dec 17, 2024
77080f0
Fixed typo in audio_classification.md (#35305)
Uvi-12 Dec 17, 2024
0531d75
[docs] Improve register_pipeline (#35300)
stevhliu Dec 17, 2024
1eee1ce
Fix loading with only state dict and low_cpu_mem_usage = True (#35217)
SunMarc Dec 18, 2024
c7e4805
[tests] make cuda-only tests device-agnostic (#35222)
faaany Dec 18, 2024
f1b7634
Trigger GitHub CI with a comment on PR (#35211)
ydshieh Dec 18, 2024
da334bc
[Whisper] 🚨 Fix whisper decoding 🚨 (#34135)
eustlb Dec 18, 2024
69e31eb
change bnb tests (#34713)
jiqing-feng Dec 18, 2024
75be5a0
[Whisper] fix docstrings typo (#35319)
eustlb Dec 18, 2024
2c47618
🚨All attention refactor🚨 (#35235)
ArthurZucker Dec 18, 2024
9a94dfe
feat: add `benchmarks_entrypoint.py` (#34495)
McPatate Dec 18, 2024
9613933
Add the Bamba Model (#34982)
fabianlim Dec 18, 2024
d19b11f
Fix documentation for ColPali (#35321)
tonywu71 Dec 19, 2024
4592cc9
Update comment CI bot (#35323)
ydshieh Dec 19, 2024
56ff1e9
PaliGemma: Make sure to add <eos> to suffix if <image> is present in …
probicheaux Dec 19, 2024
667ed56
Add ModernBERT to Transformers (#35158)
warner-benjamin Dec 19, 2024
2a134f6
Added # Copied from statements
simonreise Dec 19, 2024
1fa807f
Fix some fa2 tests (#35340)
ArthurZucker Dec 19, 2024
0ade1ca
Modernbert Release Fixes (#35344)
warner-benjamin Dec 19, 2024
f42084e
[`docs`] Add link to ModernBERT Text Classification GLUE finetuning s…
tomaarsen Dec 19, 2024
ff9141b
fix onnx export of speech foundation models (#34224)
nikosanto13 Dec 20, 2024
d94832b
Fixed # Copied from statements
simonreise Dec 20, 2024
5a2aedc
[`Mamba2`] Fix caching, slow path, and multi-gpu (#35154)
vasqu Dec 20, 2024
4e27a40
FEAT : Adding VPTQ quantization method to HFQuantizer (#34770)
wejoncy Dec 20, 2024
b5a557e
Reduce CircleCI usage (#35355)
ydshieh Dec 20, 2024
eafbb0e
Implement AsyncTextIteratorStreamer for asynchronous streaming (#34931)
CISC Dec 20, 2024
0d51d65
Cleaner attention interfaces (#35342)
Cyrilvallez Dec 20, 2024
c3a4359
Add Tensor Parallel support for Qwen2VL (#35050)
jla524 Dec 20, 2024
4567ee8
fix zoedepth initialization error under deepspeed zero3 (#35011)
Tavish9 Dec 20, 2024
05de764
Aurevoir PyTorch 1 (#35358)
ydshieh Dec 20, 2024
40292aa
bugfix: torch.export failure caused by `_make_causal_mask` (#35291)
jiwoong-choi Dec 20, 2024
34ad1bd
update codecarbon (#35243)
nhamanasu Dec 20, 2024
6fae2a8
Update test fetcher when we want to test all (#35364)
ArthurZucker Dec 20, 2024
0fc2970
Use `weights_only=True` with `torch.load` for `transfo_xl` (#35241)
ydshieh Dec 20, 2024
504c4d3
Make `test_generate_with_static_cache` even less flaky (#34995)
ydshieh Dec 20, 2024
c96cc03
Improve modular transformers documentation (#35322)
joelpaulkoch Dec 20, 2024
94fe0b9
Improved Documentation Of Audio Classification (#35368)
Uvi-12 Dec 20, 2024
608e163
[docs] Follow up register_pipeline (#35310)
stevhliu Dec 20, 2024
8f38f58
owlvit/2 dynamic input resolution (#34764)
bastrob Dec 21, 2024
0bff533
Added `segmentation_maps` support for DPT image processor
simonreise Oct 23, 2024
dd1a4e7
Added tests for dpt image processor
simonreise Oct 23, 2024
d4c2857
Moved preprocessing into separate functions
simonreise Oct 29, 2024
6f4e61e
Added # Copied from statements
simonreise Dec 19, 2024
b65774e
Fixed # Copied from statements
simonreise Dec 20, 2024
a30323c
Merge branch 'segmentation-maps-for-dpt-image-processor' of https://g…
simonreise Dec 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
234 changes: 198 additions & 36 deletions src/transformers/models/dpt/image_processing_dpt.py
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,11 @@ class DPTImageProcessor(BaseImageProcessor):
size_divisor (`int`, *optional*):
If `do_pad` is `True`, pads the image dimensions to be divisible by this value. This was introduced in the
DINOv2 paper, which uses the model in combination with DPT.
do_reduce_labels (`bool`, *optional*, defaults to `False`):
Whether or not to reduce all label values of segmentation maps by 1. Usually used for datasets where 0 is
used for background, and background itself is not included in all classes of a dataset (e.g. ADE20k). The
background label will be replaced by 255. Can be overridden by the `do_reduce_labels` parameter in the
`preprocess` method.
"""

model_input_names = ["pixel_values"]
Expand All @@ -157,6 +162,7 @@ def __init__(
image_std: Optional[Union[float, List[float]]] = None,
do_pad: bool = False,
size_divisor: int = None,
do_reduce_labels: bool = False,
**kwargs,
) -> None:
super().__init__(**kwargs)
Expand All @@ -174,6 +180,7 @@ def __init__(
self.image_std = image_std if image_std is not None else IMAGENET_STANDARD_STD
self.do_pad = do_pad
self.size_divisor = size_divisor
self.do_reduce_labels = do_reduce_labels

def resize(
self,
Expand Down Expand Up @@ -275,10 +282,162 @@ def _get_pad(size, size_divisor):

return pad(image, ((pad_size_left, pad_size_right), (pad_size_top, pad_size_bottom)), data_format=data_format)

def reduce_label(self, label: ImageInput) -> np.ndarray:
label = to_numpy_array(label)
# Avoid using underflow conversion
label[label == 0] = 255
label = label - 1
label[label == 254] = 255
return label
Comment on lines +286 to +292
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be fully copied from beit image processor, you should add a # Copied from statement above if that's the case :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


def _preprocess(
self,
image: ImageInput,
do_reduce_labels: bool = None,
do_resize: bool = None,
size: Dict[str, int] = None,
resample: PILImageResampling = None,
keep_aspect_ratio: bool = None,
ensure_multiple_of: int = None,
do_rescale: bool = None,
rescale_factor: float = None,
do_normalize: bool = None,
image_mean: Optional[Union[float, List[float]]] = None,
image_std: Optional[Union[float, List[float]]] = None,
do_pad: bool = None,
size_divisor: int = None,
input_data_format: Optional[Union[str, ChannelDimension]] = None,
):
# Adapted from transformers.models.beit.image_processing_beit

if do_reduce_labels:
image = self.reduce_label(image)

if do_resize:
image = self.resize(
image=image,
size=size,
resample=resample,
keep_aspect_ratio=keep_aspect_ratio,
ensure_multiple_of=ensure_multiple_of,
input_data_format=input_data_format,
)

if do_rescale:
image = self.rescale(image=image, scale=rescale_factor, input_data_format=input_data_format)

if do_normalize:
image = self.normalize(image=image, mean=image_mean, std=image_std, input_data_format=input_data_format)

if do_pad:
image = self.pad_image(image=image, size_divisor=size_divisor, input_data_format=input_data_format)

return image

def _preprocess_image(
self,
image: ImageInput,
do_resize: bool = None,
size: Dict[str, int] = None,
resample: PILImageResampling = None,
keep_aspect_ratio: bool = None,
ensure_multiple_of: int = None,
do_rescale: bool = None,
rescale_factor: float = None,
do_normalize: bool = None,
image_mean: Optional[Union[float, List[float]]] = None,
image_std: Optional[Union[float, List[float]]] = None,
do_pad: bool = None,
size_divisor: int = None,
data_format: Optional[Union[str, ChannelDimension]] = None,
input_data_format: Optional[Union[str, ChannelDimension]] = None,
) -> np.ndarray:
"""Preprocesses a single image."""
# Adapted from transformers.models.beit.image_processing_beit
# All transformations expect numpy arrays.
image = to_numpy_array(image)
if is_scaled_image(image) and do_rescale:
logger.warning_once(
"It looks like you are trying to rescale already rescaled images. If the input"
" images have pixel values between 0 and 1, set `do_rescale=False` to avoid rescaling them again."
)
if input_data_format is None:
# We assume that all images have the same channel dimension format.
input_data_format = infer_channel_dimension_format(image)

image = self._preprocess(
image,
do_reduce_labels=False,
do_resize=do_resize,
size=size,
resample=resample,
keep_aspect_ratio=keep_aspect_ratio,
ensure_multiple_of=ensure_multiple_of,
do_rescale=do_rescale,
rescale_factor=rescale_factor,
do_normalize=do_normalize,
image_mean=image_mean,
image_std=image_std,
do_pad=do_pad,
size_divisor=size_divisor,
input_data_format=input_data_format,
)
if data_format is not None:
image = to_channel_dimension_format(image, data_format, input_channel_dim=input_data_format)
return image

def _preprocess_segmentation_map(
self,
segmentation_map: ImageInput,
do_resize: bool = None,
size: Dict[str, int] = None,
resample: PILImageResampling = None,
keep_aspect_ratio: bool = None,
ensure_multiple_of: int = None,
do_reduce_labels: bool = None,
input_data_format: Optional[Union[str, ChannelDimension]] = None,
):
"""Preprocesses a single segmentation map."""
# Adapted from transformers.models.beit.image_processing_beit
# All transformations expect numpy arrays.
segmentation_map = to_numpy_array(segmentation_map)
# Add an axis to the segmentation maps for transformations.
if segmentation_map.ndim == 2:
segmentation_map = segmentation_map[None, ...]
added_dimension = True
input_data_format = ChannelDimension.FIRST
else:
added_dimension = False
if input_data_format is None:
input_data_format = infer_channel_dimension_format(segmentation_map, num_channels=1)
segmentation_map = self._preprocess(
image=segmentation_map,
do_reduce_labels=do_reduce_labels,
do_resize=do_resize,
size=size,
resample=resample,
keep_aspect_ratio=keep_aspect_ratio,
ensure_multiple_of=ensure_multiple_of,
do_normalize=False,
do_rescale=False,
input_data_format=input_data_format,
)
# Remove extra axis if added
if added_dimension:
segmentation_map = np.squeeze(segmentation_map, axis=0)
segmentation_map = segmentation_map.astype(np.int64)
return segmentation_map

def __call__(self, images, segmentation_maps=None, **kwargs):
# Overrides the `__call__` method of the `Preprocessor` class such that the images and segmentation maps can both
# be passed in as positional arguments.
return super().__call__(images, segmentation_maps=segmentation_maps, **kwargs)
Comment on lines +429 to +432
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here for adding a # Copied from, and same for all the other methods copied from beit as well.


@filter_out_non_signature_kwargs()
def preprocess(
self,
images: ImageInput,
segmentation_maps: Optional[ImageInput] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit tricky as it could be a breaking change, if some users use do_resize etc. as args and not kwargs. However this would not be good practice, and I don't see any way of adding segmentation_maps processing without breaking BC. I'll let a core maintainer give the green light on this or not.

do_resize: bool = None,
size: int = None,
keep_aspect_ratio: bool = None,
Expand All @@ -291,6 +450,7 @@ def preprocess(
image_std: Optional[Union[float, List[float]]] = None,
do_pad: bool = None,
size_divisor: int = None,
do_reduce_labels: Optional[bool] = None,
return_tensors: Optional[Union[str, TensorType]] = None,
data_format: ChannelDimension = ChannelDimension.FIRST,
input_data_format: Optional[Union[str, ChannelDimension]] = None,
Expand All @@ -302,6 +462,8 @@ def preprocess(
images (`ImageInput`):
Image to preprocess. Expects a single or batch of images with pixel values ranging from 0 to 255. If
passing in images with pixel values between 0 and 1, set `do_rescale=False`.
segmentation_maps (`ImageInput`, *optional*):
Segmentation map to preprocess.
do_resize (`bool`, *optional*, defaults to `self.do_resize`):
Whether to resize the image.
size (`Dict[str, int]`, *optional*, defaults to `self.size`):
Expand All @@ -326,6 +488,10 @@ def preprocess(
Image mean.
image_std (`float` or `List[float]`, *optional*, defaults to `self.image_std`):
Image standard deviation.
do_reduce_labels (`bool`, *optional*, defaults to `self.do_reduce_labels`):
Whether or not to reduce all label values of segmentation maps by 1. Usually used for datasets where 0
is used for background, and background itself is not included in all classes of a dataset (e.g.
ADE20k). The background label will be replaced by 255.
return_tensors (`str` or `TensorType`, *optional*):
The type of tensors to return. Can be one of:
- Unset: Return a list of `np.ndarray`.
Expand Down Expand Up @@ -357,9 +523,13 @@ def preprocess(
image_std = image_std if image_std is not None else self.image_std
do_pad = do_pad if do_pad is not None else self.do_pad
size_divisor = size_divisor if size_divisor is not None else self.size_divisor
do_reduce_labels = do_reduce_labels if do_reduce_labels is not None else self.do_reduce_labels

images = make_list_of_images(images)

if segmentation_maps is not None:
segmentation_maps = make_list_of_images(segmentation_maps, expected_ndims=2)

if not valid_images(images):
raise ValueError(
"Invalid image type. Must be of type PIL.Image.Image, numpy.ndarray, "
Expand All @@ -377,55 +547,47 @@ def preprocess(
size=size,
resample=resample,
)
# All transformations expect numpy arrays.
images = [to_numpy_array(image) for image in images]

if is_scaled_image(images[0]) and do_rescale:
logger.warning_once(
"It looks like you are trying to rescale already rescaled images. If the input"
" images have pixel values between 0 and 1, set `do_rescale=False` to avoid rescaling them again."
images = [
self._preprocess_image(
image=img,
do_resize=do_resize,
do_rescale=do_rescale,
do_normalize=do_normalize,
do_pad=do_pad,
size=size,
resample=resample,
keep_aspect_ratio=keep_aspect_ratio,
ensure_multiple_of=ensure_multiple_of,
rescale_factor=rescale_factor,
image_mean=image_mean,
image_std=image_std,
size_divisor=size_divisor,
data_format=data_format,
input_data_format=input_data_format,
)
for img in images
]

if input_data_format is None:
# We assume that all images have the same channel dimension format.
input_data_format = infer_channel_dimension_format(images[0])
data = {"pixel_values": images}

if do_resize:
images = [
self.resize(
image=image,
if segmentation_maps is not None:
segmentation_maps = [
self._preprocess_segmentation_map(
segmentation_map=segmentation_map,
do_reduce_labels=do_reduce_labels,
do_resize=do_resize,
size=size,
resample=resample,
keep_aspect_ratio=keep_aspect_ratio,
ensure_multiple_of=ensure_multiple_of,
input_data_format=input_data_format,
)
for image in images
for segmentation_map in segmentation_maps
]

if do_rescale:
images = [
self.rescale(image=image, scale=rescale_factor, input_data_format=input_data_format)
for image in images
]
data["labels"] = segmentation_maps

if do_normalize:
images = [
self.normalize(image=image, mean=image_mean, std=image_std, input_data_format=input_data_format)
for image in images
]

if do_pad:
images = [
self.pad_image(image=image, size_divisor=size_divisor, input_data_format=input_data_format)
for image in images
]

images = [
to_channel_dimension_format(image, data_format, input_channel_dim=input_data_format) for image in images
]

data = {"pixel_values": images}
return BatchFeature(data=data, tensor_type=return_tensors)

# Copied from transformers.models.beit.image_processing_beit.BeitImageProcessor.post_process_semantic_segmentation with Beit->DPT
Expand Down
Loading