Add GOT-OCR 2.0 to Transformers #34721

yonigozlan · 2024-11-13T20:03:43Z

What does this PR do?

Add GOT-OCR 2.0 to Transformers.

Left TODOs:

Tests
Docs
Post-processing

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2024-11-25T20:22:13Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

molbap

Seems very clean, congrats! The modular file is a bit verbose still, I left some comments for possible leads to reduce it. Let's make sure slow tests run, otherwise LGTM, and left a couple comments :)

molbap · 2024-11-28T16:44:34Z

docs/source/en/model_doc/qwen2_vl.md

interesting this formatting was missing?

Yes I guess my format-on-save did that 😅

molbap · 2024-11-28T16:47:57Z

src/transformers/models/got_ocr2/convert_got_ocr2_weights_to_hf.py

+    write_tokenizer(
+        tokenizer_path="qwen.tiktoken",
+        save_dir=args.output_dir,
+        instruct=args.instruct,


Does this model have an instruct version needed?

Oh I forgot to remove that, thanks. I'd say the model is indeed instruct, but users are not really expected to have conversation with it, and the possible prompt are explicitly defined in the processor so I'm not sure in which categories this model falls and if we should add support for a chat template here.
I'd say it might make sense to add a chat template when and if we end up adding support for fine-tuning, but to me it doesn't make much sense in this state, I might be wrong though.
What do you think @molbap @Ucas-HaoranWei ?

From afar, maybe @Ucas-HaoranWei has a different opinion, I'd just remove the instruct - as you say users are not expected to converse with it. If it's instruct-tuned in a later version, it might make sense, but it's an OCR model as it stands, it sounds confusing to label it as instruct.

Yes, I agree that there should not be a chat template, as they are fixed prompts in this state. @molbap @yonigozlan. If the later versions can be fine-tuned, then the template can be maintained to ensure that users use their new prompts.

Sounds good thank you!

src/transformers/models/got_ocr2/modular_got_ocr2.py

molbap · 2024-11-28T17:57:37Z

src/transformers/models/got_ocr2/modular_got_ocr2.py

+        resized_image = image.resize((target_width, target_height))
+
+        # split the image into patches
+        processed_images = []


these are processed_patches, rather, yes?

Yes indeed, but the reason I did not rename it that is because we add to it a "thumbnail" image which is the whole image resized

molbap · 2024-11-28T19:54:25Z

src/transformers/models/got_ocr2/modular_got_ocr2.py

+        format = output_kwargs["text_kwargs"].pop("format", False)
+        num_image_tokens = output_kwargs["images_kwargs"].pop("num_image_tokens", 256)
+        box = output_kwargs["images_kwargs"].pop("box", [None])
+        color = output_kwargs["images_kwargs"].pop("color", None)
+        multi_page = output_kwargs["images_kwargs"].pop("multi_page", False)
+        crop_to_patches = output_kwargs["images_kwargs"].pop("crop_to_patches", False)


here we could use the default values that are preset above

Do you mean something like GotOcr2ProcessorKwargs._defaults["images_kwargs"].get("crop_to_patches")?

yes, for instance! what I mean is the default values in the pop method should not be specified in two places, because if they change, it's more likely to cause errors/mismatches

That makes sense. I might be missing something, but if we have default kwargs (for multi_page, crop_to_patches, min_patches, max_patches etc.), it seems to me that there shouldn't be any issues with using "pop" without default? if the kwarg is not specified by the user, pop will return the default, and it will return the user-specified value otherwise?

@molbap just pinging you on this last question

Ah, you're right - I guess it's just weird to me to see the default values written in two distinct places, does not seem necessary 🤔

src/transformers/models/got_ocr2/modular_got_ocr2.py

molbap · 2024-11-28T20:10:14Z

src/transformers/models/got_ocr2/modular_got_ocr2.py

+class GotOcr2VisionAdapter(nn.Module):
+    def __init__(self, language_hidden_size: int, vision_output_channels: int):
+        super().__init__()
+        self.conv_up1 = nn.Conv2d(


I assume up stands for upsampler like in swin2sr, but a more explicit name would be better

src/transformers/models/got_ocr2/modular_got_ocr2.py

tests/models/got_ocr2/test_modeling_got_ocr2.py

yonigozlan · 2024-11-29T16:05:40Z

Hey @ArthurZucker
This should be ready for you to review :).
There's one question left:

I'd say the model is indeed instruct, but users are not really expected to have conversation with it, and the possible prompt are explicitly defined in the processor so I'm not sure in which categories this model falls and if we should add support for a chat template here.
I'd say it might make sense to add a chat template when and if we end up adding support for fine-tuning, but to me it doesn't make much sense in this state, I might be wrong though.

piercelamb · 2024-12-18T15:31:37Z

HI all -- eager to try this model in transformers in the new year

yonigozlan mentioned this pull request Nov 13, 2024

Add support for GOT-OCR2.0 #34173

Open

2 tasks

yonigozlan force-pushed the add-got-ocr2 branch from 93b1d19 to af8035d Compare November 14, 2024 17:30

yonigozlan mentioned this pull request Nov 14, 2024

Integrating GOT-OCR2.0 in Transformers 🤗 Ucas-HaoranWei/GOT-OCR2.0#137

Open

yonigozlan added New model Multimodal labels Nov 14, 2024

Ucas-HaoranWei approved these changes Nov 15, 2024

View reviewed changes

Ucas-HaoranWei approved these changes Nov 18, 2024

View reviewed changes

yonigozlan mentioned this pull request Nov 21, 2024

Fix support for image processors modifications in modular #34866

Merged

5 tasks

yonigozlan force-pushed the add-got-ocr2 branch from f8e1ac9 to 4007fb2 Compare November 25, 2024 19:29

yonigozlan requested review from qubvel and molbap November 25, 2024 21:24

molbap approved these changes Nov 28, 2024

View reviewed changes

yonigozlan added the run-slow label Nov 29, 2024

yonigozlan requested review from ArthurZucker and molbap November 29, 2024 16:06

yonigozlan added 11 commits December 5, 2024 17:54

init modular got_ocr2

16c3388

Get correct got_ocr architecture

9f93654

add processing

c0f4bfe

run modular with processing

a3c8f67

add working inference

c55bfbc

apply modular

e2f9cf5

Refactor and fix style

5b628d1

Refactor, cleanup, fix style

84c76a6

fix init order

9828b29

Fix docs

adc6b9a

add base modeling tests

c7fa74b

yonigozlan added 21 commits December 5, 2024 17:54

fix style and consistency

4bcfc04

rename doc file

ec4a8f9

fix repo consistency

b57b336

fix inference with box

9151aea

add image processing and support for crop_to_multi_page

bf0ea21

Fix batch inference

d2caee3

add tests

70d6f30

fixup

2000570

fix slow test

7817fe7

fix docstrings

330478b

Add model doc

4eb7b94

update to new init

85a00c5

fix input autocast pixel_values dtype

bf21173

update doc

568be30

move doc to multimodal

9e49b2d

Reformat crop_image_to_patches and add docstrings

d67af61

Fix example in forward docstring

8add3a1

Address Pablo review

10e5644

[run slow] got_ocr2

5c3a4eb

remove defaults defined twice

0d22212

apply modular

df94db3

yonigozlan force-pushed the add-got-ocr2 branch from 7e88dbe to df94db3 Compare December 5, 2024 18:02

yonigozlan and others added 2 commits December 5, 2024 18:15

add torch_device to integration tests

879fe3e

Merge branch 'main' into add-got-ocr2

79e5734

update modular

8fb0ed7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GOT-OCR 2.0 to Transformers #34721

Add GOT-OCR 2.0 to Transformers #34721

yonigozlan commented Nov 13, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Nov 25, 2024

molbap left a comment

molbap Nov 28, 2024

yonigozlan Nov 29, 2024

molbap Nov 28, 2024

yonigozlan Nov 29, 2024

molbap Dec 2, 2024

Ucas-HaoranWei Dec 2, 2024

yonigozlan Dec 2, 2024

molbap Nov 28, 2024

yonigozlan Nov 29, 2024

molbap Nov 28, 2024

yonigozlan Nov 29, 2024

molbap Dec 2, 2024

yonigozlan Dec 2, 2024

yonigozlan Dec 4, 2024

molbap Dec 4, 2024

molbap Nov 28, 2024

yonigozlan commented Nov 29, 2024 •

edited

Loading

piercelamb commented Dec 18, 2024

Add GOT-OCR 2.0 to Transformers #34721

Are you sure you want to change the base?

Add GOT-OCR 2.0 to Transformers #34721

Conversation

yonigozlan commented Nov 13, 2024 • edited Loading

What does this PR do?

Who can review?

HuggingFaceDocBuilderDev commented Nov 25, 2024

molbap left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yonigozlan commented Nov 29, 2024 • edited Loading

piercelamb commented Dec 18, 2024

yonigozlan commented Nov 13, 2024 •

edited

Loading

yonigozlan commented Nov 29, 2024 •

edited

Loading