-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extension: Rename Images to fit GT-File by name #3
Comments
Proposal: each image file is linked to a physical page. |
For a test sample cf https://github.com/M3ssman/gt-test/releases/tag/v2.1.2 , where for 128 GT-Pages (Latin) only 108 images got included, since 20 images collide. |
The problem is solved with the update of ocr-d (Bagit). Also use the new action workflow. Regards tboenig |
Re-open, some more review required. |
IIUC this behavior should be fixed since v2.59.0, the relevant PR is OCR-D/core#1137. Is there a regression, are you still experiencing files being overwritten when bagging? |
@kba I'll take a closer look at the relevant modifications and try this out with our custom setup ASAP |
Description
Actually, when images files ere referenced in
mets.xml
in group, the get downloaded an pushed to directory
.This way, the naming similarity between image and GT-data is lost. But this similarity is a key requirement for tools like Transkribus or LAREX to match image with GT-data for further corrections or extensions.
Even worse, because our data consists of a overall sample of 40.000+ prints, it includes for example several images named "00000008.jpg" which could overwrite each other.
The text was updated successfully, but these errors were encountered: