You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wonder what ALTO version OtherTag/@VALUE conforms to. Is that a Transkribus or ULB extension @M3ssman ?
Generally, IMO we do need to support this kind of information in the annotation files themselves (PAGE/ALTO), but should also consider the case where it enters as metadata (METS/MODS). For the latter, we have the https://github.com/ocr-d/gt-labelling schema, but that does not contain any definitions on subject/genre/content class yet. There is a classification schema for content items in ENMAP (§10 Annex 2), a set of newspaper article types in DTABf for example. Somewhat related, one could also consider relevant the non-structural (i.e. metadata) types of DFG Strukturdatenset, or the general set of text sorts in DTA and DWDS...
Anyway, back to the annotation schema in ALTO: Why OtherTag in the first place – shouldn't this kind of information be placed in LayoutTagby convention? On the PAGE side, it's always MetadataItem I suppose.
Here I made a proposal to mirror the gt-labelling info from MODS into the MetadataItem in PAGE BTW.
Don't worry, this is originates from my very first and superficial interpretation of ALTO to express additional content information.
Has nothing to do with Transkribus, how drops this element anyway due it's limited Transformation capabilities.
With Version 2.1 (2014), according to ALTO Schema they introduced annotations like LayoutTagStructureTagRoleTagNamedEntityTagOtherTag . Nowadays I guess they were intended to be able to express neat relations from even single String-Element's TAGREFS via the NER-Tag.
If I would do it again ( ... which is not planned) I'd go for the ComposedBlockType@TYPE attribute, which shall a string to express what sort the included sub-regions are made of: table, advertisement, ... (example values from ALTO schema definition).
The type-stuff for Blocks (and Illustrations!) seems to be part of the spec since the very beginning.
It's dated in the prelude back to 2004, even before Version 1.3 of ALTO has been tagged.
digital-eval/digital_eval/model.py
Lines 234 to 237 in 98a9c24
I wonder what ALTO version
OtherTag/@VALUE
conforms to. Is that a Transkribus or ULB extension @M3ssman ?Generally, IMO we do need to support this kind of information in the annotation files themselves (PAGE/ALTO), but should also consider the case where it enters as metadata (METS/MODS). For the latter, we have the https://github.com/ocr-d/gt-labelling schema, but that does not contain any definitions on subject/genre/content class yet. There is a classification schema for content items in ENMAP (§10 Annex 2), a set of newspaper article types in DTABf for example. Somewhat related, one could also consider relevant the non-structural (i.e. metadata) types of DFG Strukturdatenset, or the general set of text sorts in DTA and DWDS...
Anyway, back to the annotation schema in ALTO: Why
OtherTag
in the first place – shouldn't this kind of information be placed inLayoutTag
by convention? On the PAGE side, it's alwaysMetadataItem
I suppose.Here I made a proposal to mirror the gt-labelling info from MODS into the MetadataItem in PAGE BTW.
@kba RFC
The text was updated successfully, but these errors were encountered: