From 48460d41f2ba09504eccf38a6caaed0bb76e2046 Mon Sep 17 00:00:00 2001 From: Emerson Rocha Date: Mon, 29 Nov 2021 17:48:42 -0300 Subject: [PATCH] #10, #11: docs/eng-Latn/hxltm.adoc: use cases --- docs/eng-Latn/hxltm.adoc | 97 +++++++++++++++++++++++++++++++++------- 1 file changed, 82 insertions(+), 15 deletions(-) diff --git a/docs/eng-Latn/hxltm.adoc b/docs/eng-Latn/hxltm.adoc index 520ec0f..6e9d28e 100644 --- a/docs/eng-Latn/hxltm.adoc +++ b/docs/eng-Latn/hxltm.adoc @@ -30,7 +30,11 @@ General experience with terminology, even as an user of https://iate.europa.eu/f https://unterm.un.org/[UNTERM] or end user interface with similar propose, is helpful to undestand how HXLTM use these levels. -The `4. _Fourth-level_` (not used with this nomenclature on other standards) means arbitrary data related to entire dataset _knows_ about itself: +The **`4. _Fourth-level_`** +(not used with this nomenclature on other standards) when used on HXLTM documentation means arbitrary data related to the entire dataset _knows_ about itself and does not fit as _Abstract_ of any of the 3 levels (not even the `1. **Concept-level**`) +This can be used, for example, as a base to store **in every data row** title or description of an TBX. + +//// for example the relationship between linguistic datasets, information about how it is processed, etc. // It can also be used to save on HXLTM tabular format what would be on metadata from XML containers with one issue: @@ -39,16 +43,35 @@ information about how it is processed, etc. TIP: If you are _only_ a end user, you can ignore referentes to the `4. _Fourth-level_`. But the idea of _Concrete vs Abstract_ is relevant as it can affect how you label data. +//// [#item-meta] -==== Concrete vs Abstract -The way `1. Concept-level`, `2. Language-level` and `3. Term-level` expressions used on HXLTM also have two options of base hashtag which could be explained as making the data either concrete (like the main objective) or abstract (like metadata). +=== Concrete vs Abstract +The way `1. Concept-level`, `2. Language-level` and `3. Term-level` expressions used on HXLTM also have two options of base hashtag which could be explained as making the data either concrete +(like the main objective intended to be always used) +or abstract (like generic metadata or data, like very new `2. Language-level` / `3. Term-level` columns, that are not ready yet). + +//// +Note that most terminology formats are designed to only export final data. +By default HXLTM tools when importing from then the terms will save with HXL hashtags that are "concrete". +//// + +//// +The optimized use case of HXLTM is focused on emergency response **and** multilingual content: +There is special care for languages which could not be worked on places like Europe IATE or other humanitarian online terminology frontend because are not prioritized. +//// +//// +While most examples of HXLTM made by HXP-CPLP are already publish CSVs, XLSXs and Google Sheets, +the HXLTM tooling can be used to +//// +//// This distinction is made both to allow ad-hoc differentiation when parsing HXL directly, without HXLTM-aware tools, by simply changing the base tag. TIP: For example you may be doing a collaborative translation but tools that fetch you data and publish may be marked to not export entire coluns (like new translations) that are marked as abstract. +//// //// NOTE: tools parsing HXLTM tables directly should undestand @@ -60,11 +83,8 @@ if a data source needs to be processed both by old and new tools, this feature can be explored //// -=== Base tags used when HXLTM on tabular container +=== HXLTM on tabular container -Compared to the HXLStandard, -while the HXLTM reference tools will allow mix with other HXL tags, -most optimized operations for formats that are not tabular HXLTM will work with only `#item` and `#meta` *and* require an extra base HXL attribute. // Such extra attribute also match the `1. Concept-level`, `2. Language-level` and `3. Term-level` idea. The baseline HXL hashtags _(when using Latin script)_ are the following: @@ -80,7 +100,42 @@ The baseline HXL hashtags _(when using Latin script)_ are the following: 4. _Fourth-level_ ** `#x_meta` -== HXL base hashtags for HXLTM +Trivia: Compared to the HXLStandard, +while the HXLTM reference tools will allow mix with other HXL generic tags (for example, `#date`), +the most optimized operations for formats that are not tabular HXLTM will work with only `#item` and `#meta` *and* require an extra base HXL attribute. +Without this extra attribute HXLTM tools will assume you are mixing generic HXL. + +=== Use with not typical linguistic content + +* https://tools.ietf.org/search/bcp47 +** https://en.wikipedia.org/wiki/ISO_15924 +** https://en.wikipedia.org/wiki/ISO_639-3 + +==== One non typical language + +In addition to allow mix linguistic content +(for example, extra metadata, codes, etc) +is also possible to reuse HXLTM tools for no linguistic content at all: +you just need _create_ your own private language code. +Since HXLTM operates using BCP47, +the most generic base to use is ISO 15924 `Zyyy`` and ISO 639-3 `zxx``: +`zxx-Zyyy` (or `+i_zxx+is_Zyyy`) + +==== Several non typical languages +Both use of BCP47 one or more private tags, +`zxx-Zyyy-x-privatum` (or `+i_zxx+is_Zyyy+ix_privatum`), +or language codes and language scripts, +like `qaa-Zyyy` (or `+i_qaa+is_Zyyy`), +can be used. + +==== Text descriptions for non typical languages + +When using HXLTM to encode either one non or several typical languages, +for example quick examples of programming hello worlds, +you can writte the human descriptions as definitions of a real natural language. + +== HXL base hashtag for HXLTM +When working with HXLTM on a tabular container, it is necessary specify a base HXL hashtag. === `+#item+` @@ -99,14 +154,14 @@ Datasets with valid HXL base hashtags (but not explicitly known as part of HXLTM, like your user-configurable Ontologia) can be used when creating more generic exporters from tabular formats. -NOTE: operations related to transpose data (see <<#__linguam__>>), +NOTE: operations related to transpose data (see <<#linguam>>), which already are very advanced to simplify for the end user, did not explicitly have promises that will keep it working. If you have generic HXL tags that want to transpose, the more reliable way would be attach explicitly to one of the <<#conceptum-linguam-terminum>>. -=== Behavior for columns without HXL hashtags (but tabular dataset already is HXLated) +==== Behavior for columns without HXL hashtags (but tabular dataset already is HXLated) HXLTM tools will not create **new** columns on HXLTM tabular datasets without HXL hashtags. But it _MAY_ re-export columns without HXL headings when no advanced transposition is done and MAY allow exporters specifying exact column order of original dataset. @@ -129,19 +184,29 @@ to add the tags used by HXLTM. HXL attribute for **Concept-level** representation (See <<#conceptum-linguam-terminum>>). +==== `+conceptum+codicem` + === `+linguam` HXL attribute required for **Language-level** representation (See <<#conceptum-linguam-terminum>>). -Required: <<#__linguam__>> +Required additional atttribute: <<#linguam>> + +=== `+linguam+definitionem` + +While each language can have several terms, the textual definition should be defined at language level. + +NOTE: HXLTM intentionally **NOT** allows set textual definition on Concept-level. + +Required additional atttribute: <<#linguam>> === `+terminum` HXL attribute required for **Term-level** representation (See <<#conceptum-linguam-terminum>>). -Required: <<#__linguam__>> +Required additional atttribute: <<#linguam>> -[#__linguam__] +[#linguam] === `+__linguam__+` Both user documentation and ontologia file uses `+__linguam__+` to represent an unlimited (but predictable) number of HXL attributes related to express the idea of language (often a language code). @@ -211,8 +276,7 @@ hxltmdexml --agendum-linguam lat-Latn,arb-Arab testum/hxltm-salve-mundi.hxltm.xm include::../testum/resultatum/hxltm-salve-mundi.tm.hxl.csv[] ---- -> TODO: make it work with new format -> `hxltmcli hxltm-exemplum-glossarium-minimum.tm.hxl.csv --objectivum-TMX` +_TODO: make it work with new format `hxltmcli hxltm-exemplum-glossarium-minimum.tm.hxl.csv --objectivum-TMX`_ //// == Drafts @@ -274,4 +338,7 @@ Did you know that UTX is public domain? That's fantastic! [#TBX] === TermBase eXchange (TBX) (the creative commons licensed) +* https://www.tbxinfo.net/wp-content/uploads/2016/10/tbx_oscar.pdf +* http://www.terminorgs.net/downloads/TBX_Basic_Version_3.1.pdf + _TODO: add more information here_