Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for GOT-OCR2.0 #34173

Open
2 tasks done
VladOS95-cyber opened this issue Oct 15, 2024 · 11 comments
Open
2 tasks done

Add support for GOT-OCR2.0 #34173

VladOS95-cyber opened this issue Oct 15, 2024 · 11 comments

Comments

@VladOS95-cyber
Copy link
Contributor

Model description

As an OCR-2.0 model, GOT can handle all artificial optical signals (e.g., plain texts, math/molecular formulas, tables, charts, sheet music, and even geometric shapes) under various OCR tasks. On the input side, the model supports commonly used scene- and document-style images in slice and whole-page styles. On the output side, GOT can generate plain or formatted results (markdown/tikz/smiles/kern) via an easy prompt. Besides, the model enjoys interactive OCR features, i.e., region-level recognition guided by coordinates or colors.

Open source status

  • The model implementation is available
  • The model weights are available

Provide useful links for the implementation

Implementation: https://github.com/Ucas-HaoranWei/GOT-OCR2.0/
Paper: https://arxiv.org/abs/2409.01704

@VladOS95-cyber
Copy link
Contributor Author

Hello, If someone from the core-team is not already working on this, or there is interest it in, I would really love to contribute this model to transformers with some help!

@GargDivanshu
Copy link

hi @VladOS95-cyber If you don't mind can I help u with this issue, if you are working on it ?

@VladOS95-cyber
Copy link
Contributor Author

Hi @GargDivanshu, I don't mind at all, let's wait for decision from @qubvel @LysandreJik

@yonigozlan
Copy link
Member

Hey @VladOS95-cyber @GargDivanshu !
I'm planning to start working on it very soon, I'll tag this issue once I've opened a PR for it, if you want to have a look then!

@GargDivanshu
Copy link

cool 🙌

@Youho99
Copy link

Youho99 commented Nov 4, 2024

+1

@jshcrm
Copy link

jshcrm commented Nov 6, 2024

Any movement on this? Looking forward to trying it out

@plamb-viso
Copy link

Been testing stepfun's demo code along with the model -- would love to see this in transformers!

@yonigozlan
Copy link
Member

Hey all!
Implementation is well underway, and I'll open a PR in a couple of days for it (the entire Hugging Face team is currently at an off-site). Most likely, only inference will be available initially, and support for fine-tuning will be added if there is strong demand for it.

@plamb-viso
Copy link

@yonigozlan if you remember, please paste the PR link in this thread, would love to subscribe

@yonigozlan
Copy link
Member

Hi again!
The GOT-OCR PR is live here if you want to follow the progress :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants