-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support non-flat directory structures (especially regarding ocr-d interoperability) #240
Comments
Above described ability to directly read images / XML files from ocr-d METS files (as well as an overhaul of the data structure required for direct requests) was added in d675aab |
I have just started testing the new However, there seem to be some issues with the METS support. I am still trying to fully grasp why my OCR-D workspaces won't open, but the following block in the library view seems odd: LAREX/src/main/java/de/uniwue/web/controller/ViewerController.java Lines 249 to 268 in 8ccd341
I wonder:
LAREX used to have an inconsistency around the path resolution:
I have not even looked at the writing side for now. But I can imagine it is difficult to get consistency when you want to support both the old flat and the new METS bookpaths. |
Looking more closely, the current implementation is neither the one nor the other (PAGE directory or METS directory), but a different beast: the PAGE directory's parent. Of course, the latter two often coincide, but as soon as the PAGE lives in the root level, or in a directory deeper than one level below the METS, then it won't work. |
Another observation is that the order of the bookpath in the library view has changed. It used to be sorted alphabetically (which is best for many purposes I guess), but now looks random. Ideally, we could click on the column titles to have it sorted by name or date... |
Something seemed to have gone wrong during one of the last pushes into dev, this worked prior to this and I can reproduce it in the current dev. @chaddy314 or me will try to fix this ASAP. |
Internally
This seems to be an overlooked relic from the early stages of the implementation and will be fixed ASAP.
Alphabetical sorting will be pushed soon to the current pull request. Sorting by column is a really nice suggestion I'm considering implementing after this upcoming release. Thanks for the detailed report! |
Using something like DataTables should make this pretty easy. |
@bertsky Could you send us the OCR-D workspace which failed to open for you so that we could test it locally? 😁 |
@maxnth unfortunately, I don't have it anymore. I'll recreate it and test out the new RC myself (but not before Aug 16, sry) |
BTW the current implementation is still incompatible with fileGrps that use the OCR-D convention for storing derived images (i.e. inside the same grp, under the same page id, but merely an image mime type). On these, I cannot get past the Open Book dialog. It just stays there, regardless how often I press NEXT. And stderr shows:
(Like explained above, the proper algorithm must iterate the fileGrp by structMap page ids and pick the maximum mimetype, i.e. PAGE or single image.) |
Note: spaces and dots in the directory of the book will also break the METS reader. |
As already proposed in #160 it should be possible to work with non-flat directory structures, especially regarding interoperability with ocr-d.
The first and probably essential step would be to rewrite the way direct requests work. Instead of supplying as single bookpath, direct requests should expect e.g. a map of images with image variants (+ possibly additional information).
When this is implemented writing a module which reads e.g. image variants and segmentation from an ocr-d METS file and passing it on to LAREX should become a lot easier than with our current implementation.
This would also allow the enhancement for OCR4all described here.
The text was updated successfully, but these errors were encountered: