You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, if an OCR-frame is grabbed from existing data and stored back, only permits ALTO both as input and output format.
While ALTO input is common and therefore a very reasonable input format, it would ease the usage with other post correction Tools, different from Transcribus expert client, if we can store the OCR-frame also as PAGE XML.
We have to deal with 2 scenarios when saving as PAGE:
fit Transcribus custom Page 2013 format
fit Page 2019 format used by OCR-D-Tools as well as LAREX
To do so, please add an output format flag --output-format which internally distinguish between page2013 and page2019.
If not set, defaults to recent alto flavor of Tesseract-OCR, which means technically just preserve as provided.
The text was updated successfully, but these errors were encountered:
Description
Currently, if an OCR-frame is grabbed from existing data and stored back, only permits ALTO both as input and output format.
While ALTO input is common and therefore a very reasonable input format, it would ease the usage with other post correction Tools, different from Transcribus expert client, if we can store the OCR-frame also as PAGE XML.
We have to deal with 2 scenarios when saving as PAGE:
To do so, please add an output format flag
--output-format
which internally distinguish betweenpage2013
andpage2019
.If not set, defaults to recent
alto
flavor of Tesseract-OCR, which means technically just preserve as provided.The text was updated successfully, but these errors were encountered: