forked from kba/gt-guidelines
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
5 changed files
with
133 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,3 +13,4 @@ broomer.sh | |
*_archiv.xml | ||
new/ | ||
out/ | ||
parser/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
<?xml version="1.0" encoding="UTF-8"?> | ||
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd"> | ||
<topic id="lySeitentypen2"> | ||
<title>Titelblätter, Inhaltsverzeichnisse, Register, Indizies</title> | ||
<body><section> | ||
<title>formale und inhaltliche Aspekte</title> | ||
<p>Diese Seitentypen können als besondere Seiten angesehen werden. Sie enthalten spezifische | ||
Metadaten oder werden für spezifische Funktionen genutzt:<ul id="ul_rp3_nvn_cyb"> | ||
<li>Titelblatt → bibliographische Metadaten zur Publikation</li> | ||
<li>Inhaltsverzeichnisse, Verzeichnisse → inhaltlich-strukturelle Metadaten zur | ||
Publikation mit einer Auflistung aller Abschnitte, Kapitel der | ||
Publikation</li> | ||
<li>Register, Indizies → inhaltliche Metadaten zur Publikation in dem Register, | ||
Indizies die Publikation zu bestimmten Aspekten den Inhalt ordnen</li> | ||
</ul> | ||
</p> | ||
</section> | ||
<section> | ||
<title>Transkription dieser Seitentypen</title><p>Vor allem die Segmentierung dieser Seiten sollte sich am Zweck des GTs orientieren.</p> | ||
<p>Es wird empfohlen</p> | ||
<p> | ||
<simpletable frame="all" relcolwidth="1* 1* 1*" id="simpletable_x22_qxn_cyb"> | ||
<sthead> | ||
<stentry>Seitentyp</stentry> | ||
<stentry>Segmentierungen</stentry> | ||
<stentry>Page @typ</stentry> | ||
</sthead> | ||
<strow> | ||
<stentry>Titelblatt</stentry> | ||
<stentry> | ||
<ul id="ul_rf5_fgl_pzb"> | ||
<li><tt><TextRegion type="paragraph"></tt></li> | ||
<li><tt><TextRegion type="paragraph"></tt></li> | ||
</ul> | ||
</stentry> | ||
<stentry>title</stentry> | ||
</strow> | ||
<strow> | ||
<stentry>Inhaltsverzeichnisse, Verzeichnisse</stentry> | ||
<stentry> | ||
<ul id="ul_rf6_fgl_pzb"> | ||
<li><tt><TextRegion type="header"></tt></li> | ||
<li><tt><TextRegion type="paragraph"></tt></li> | ||
</ul></stentry> | ||
<stentry>table-of-contents </stentry> | ||
</strow> | ||
<strow> | ||
<stentry>Register, Indizies</stentry> | ||
<stentry> | ||
<ul id="ul_rf7_fgl_pzb"> | ||
<li><tt><TextRegion type="header"></tt></li> | ||
<li><tt><TextRegion type="paragraph"></tt></li> | ||
</ul> | ||
</stentry> | ||
<stentry>index</stentry> | ||
</strow> | ||
</simpletable> | ||
</p> | ||
</section> | ||
|
||
</body> | ||
</topic> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
<?xml version="1.0" encoding="UTF-8"?> | ||
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd"> | ||
<topic id="ruleset"> | ||
<title>OCR-D-GT-Ruleset</title> | ||
<body> | ||
<p>Das OCR-D-GT-Ruleset ist ein Regelsatz, der verschiedene Schreibweisen von Buchstaben in | ||
verschiedenen Level der Transkription dokumentiert. Es sind drei Level vorgesehen: <ul | ||
id="ul_cbf_2xx_rzb"> | ||
<li>Die erste Spalte enthält die Schreibweise, die spezifische drucktechnische | ||
Aspekte und typographischen Besonderheiten nicht beachten (<xref | ||
href="level_1_4.dita"/>).</li> | ||
<li>die zweite Spalte enthält die Schreibweise, die <b>Drucktechnischen | ||
Gegebenheiten</b> wiedergeben und eine Interpretation von Zeichen orientiert | ||
sich am <b>Gebrauch im Sprach- und Schriftsystem</b> (<xref | ||
href="level_2_2.dita"/>). </li> | ||
<li>die dritte Spalte enthält die Schreibweise, die eine <b>Interpretation</b> von | ||
Graphen <b>vollständig unterlässt</b>. Der Graph wird als ein Codepoint unter | ||
Nutzung von standardisierten Kodierungen (Unicode), communitynormierten | ||
Kodierungen (MUFI) und durch das Koordinierungsgremium festgelegten Kodierungen | ||
abgebildet (<xref href="level_3_1.dita"/>). </li> | ||
</ul></p> | ||
<p> | ||
<codeblock> | ||
{"ruleset":[ | ||
|
||
{"rule": ["a","a","a"], "type": "level"}, | ||
{"rule": ["aa","ã","ã"], "type": "level"}, | ||
{"rule": ["e","e","e"], "type": "level"} | ||
|
||
] | ||
}</codeblock> | ||
</p> | ||
<p>Das OCR-D-GT-Ruleset wird im JSON-Format gespeichert.</p> | ||
<p>Das OCR-D-GT-Ruleset-JSON-Schema entspicht der Version 2020-12.</p> | ||
<ul> | ||
<li><xref href="https://github.com/tboenig/gt-guidelines/schema/OCR-D-GT-levelSchema.json" | ||
format="html" scope="external">OCR-D-GT-Ruleset-JSON-Schema</xref></li> | ||
|
||
</ul> | ||
</body> | ||
</topic> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
{ | ||
"$schema": "https://json-schema.org/draft/2020-12/schema", | ||
"title": "OCR-D-GT-levelSchema", | ||
"description": "The OCR-D-GT-LevelRuleset is a set of rules that documents different ways of writing letters at different levels of transcription. Three levels are provided.", | ||
"type": "object", | ||
"properties": { | ||
"ruleset": { | ||
"type": "array", | ||
"items": { | ||
"type": "object", | ||
"properties": { | ||
"rule": { | ||
"type": "array", | ||
"items": { | ||
"type": "string" | ||
} | ||
}, | ||
"type": { | ||
"type": "string" | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} |