Add support for GOT-OCR2.0 #34173

VladOS95-cyber · 2024-10-15T10:44:22Z

Model description

As an OCR-2.0 model, GOT can handle all artificial optical signals (e.g., plain texts, math/molecular formulas, tables, charts, sheet music, and even geometric shapes) under various OCR tasks. On the input side, the model supports commonly used scene- and document-style images in slice and whole-page styles. On the output side, GOT can generate plain or formatted results (markdown/tikz/smiles/kern) via an easy prompt. Besides, the model enjoys interactive OCR features, i.e., region-level recognition guided by coordinates or colors.

Open source status

The model implementation is available
The model weights are available

Provide useful links for the implementation

Implementation: https://github.com/Ucas-HaoranWei/GOT-OCR2.0/
Paper: https://arxiv.org/abs/2409.01704

VladOS95-cyber · 2024-10-15T10:45:16Z

Hello, If someone from the core-team is not already working on this, or there is interest it in, I would really love to contribute this model to transformers with some help!

GargDivanshu · 2024-10-16T07:16:19Z

hi @VladOS95-cyber If you don't mind can I help u with this issue, if you are working on it ?

VladOS95-cyber · 2024-10-16T07:42:43Z

Hi @GargDivanshu, I don't mind at all, let's wait for decision from @qubvel @LysandreJik

yonigozlan · 2024-10-17T14:43:12Z

Hey @VladOS95-cyber @GargDivanshu !
I'm planning to start working on it very soon, I'll tag this issue once I've opened a PR for it, if you want to have a look then!

GargDivanshu · 2024-10-17T18:23:24Z

cool 🙌

Youho99 · 2024-11-04T11:09:10Z

+1

jshcrm · 2024-11-06T06:22:59Z

Any movement on this? Looking forward to trying it out

plamb-viso · 2024-11-11T16:05:23Z

Been testing stepfun's demo code along with the model -- would love to see this in transformers!

yonigozlan · 2024-11-11T16:17:40Z

Hey all!
Implementation is well underway, and I'll open a PR in a couple of days for it (the entire Hugging Face team is currently at an off-site). Most likely, only inference will be available initially, and support for fine-tuning will be added if there is strong demand for it.

plamb-viso · 2024-11-11T16:40:35Z

@yonigozlan if you remember, please paste the PR link in this thread, would love to subscribe

yonigozlan · 2024-11-13T20:36:57Z

Hi again!
The GOT-OCR PR is live here if you want to follow the progress :)

VladOS95-cyber added the New model label Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for GOT-OCR2.0 #34173

Add support for GOT-OCR2.0 #34173

VladOS95-cyber commented Oct 15, 2024

VladOS95-cyber commented Oct 15, 2024

GargDivanshu commented Oct 16, 2024

VladOS95-cyber commented Oct 16, 2024

yonigozlan commented Oct 17, 2024

GargDivanshu commented Oct 17, 2024

Youho99 commented Nov 4, 2024

jshcrm commented Nov 6, 2024

plamb-viso commented Nov 11, 2024

yonigozlan commented Nov 11, 2024

plamb-viso commented Nov 11, 2024

yonigozlan commented Nov 13, 2024

Add support for GOT-OCR2.0 #34173

Add support for GOT-OCR2.0 #34173

Comments

VladOS95-cyber commented Oct 15, 2024

Model description

Open source status

Provide useful links for the implementation

VladOS95-cyber commented Oct 15, 2024

GargDivanshu commented Oct 16, 2024

VladOS95-cyber commented Oct 16, 2024

yonigozlan commented Oct 17, 2024

GargDivanshu commented Oct 17, 2024

Youho99 commented Nov 4, 2024

jshcrm commented Nov 6, 2024

plamb-viso commented Nov 11, 2024

yonigozlan commented Nov 11, 2024

plamb-viso commented Nov 11, 2024

yonigozlan commented Nov 13, 2024