Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the way the HXLTM ontologia files is described/referenced to suggest possibility of verbs in other writing systems #7

Open
fititnt opened this issue Nov 14, 2021 · 0 comments
Milestone

Comments

@fititnt
Copy link
Member

fititnt commented Nov 14, 2021


Currently, the HXLTM ontology file uses Latin in Latin script for the reference implementation while part of the documentation is itself in English. The point of this issue is consideration to at least rename the file used by the ontology file in such a way that eventually could exist versions for the verbs itself in any other writing system. I understand that as 2021, most people tend to tolerate write in some Latin script when developing programs, but by at least intentionally make the writting system part of the ontology file naming, this could at least not lock such type of thinking.

For short explanation on this issue, some tags.

# @ARCHIVUM       cor.hxltm.yml
# (...)
__ontologia_cor_versionem__:  v0.8.6+EticaAI+voluntārium-commūne

fontem_archivum_extensionem:
  .tm.hxl.csv: HXLTM
  .xliff.hxl.csv: CSV-HXL-XLIFF
  # (...)

normam:
  Ad-Hoc:
    __meta:
 # (...)
  CSV-3:
    __meta:
      # archivum_extensionem: .csv
      archivum:
        extensionem: .csv
      descriptionem: |
        ...
 # (...)


ontologia_aliud:
  accuratum:
    "?":
      # The '?' express what to do when the entire column does not exist, so
      # is not a particular value that is missing
      _IATE_valorem_codicem: "★★"
      _IATE_valorem_descriptionem: |
        Automatically assigned to terms entered or updated by native speakers.
      _IATE_valorem_nomen: "Minimum reliability"
      _IATE_valorem_numerum: 6

# (...)

  genus_grammaticum:
    lat_commune:
      _aliud: 'TBX_other'
      # _codicem: lat_commune
      _codicem_TBX: TBX_other
      _descriptionem: 
      codicem_lat: commune
# (...)

  partem_orationis:
    lat_adverbium:
      _aliud: 'TBX_adverb|UTX_adverb'
      _codicem: lat_adverbium
      _codicem_TBX: TBX_adverb
      _codicem_UTX: UTX_adverb
      _codicem_wikidata: Q380057 # https://www.wikidata.org/wiki/Q380057
      _normam: https://la.wikipedia.org/wiki/Adverbium
      codicem_lat: adverbium
# (...)

Note that this is different from "documentation translation". Both documentation and even file paths, new data standards to be added by users on the current ontology file already allow full Unicode support. The main point here is at least make as part of the ontology file name the writing system of the verbs.

How to make even ontology file tolerate different verbs for writing systems

One requirement would be what each ontology verb means between writing systems. Since they are limited, even without adding hardcoded support to the reference implementations, someone could replace the verbs from/to new languages. If at some point do exist interested people who use non-Latin script, such mapping (may be done by external tool) could be used when converting from one region to another.

Note that a good part of these verbs also are part of the command line arguments. So if such mappings are well documented, this makes it possible to at least our reference implementation be used by other regions. The opposite could be true.

Anyway, one potential advantage of allowing this is if for some reason there exists a baseline community in other regions (for example, speakers of Arabic dialects, or Hindi, etc) they could be free to have some differences without wait by Etica.AI merge them.

Example

On practice this means that terms like normam (https://en.wiktionary.org/wiki/norma#Latin), ontologia_aliud (https://la.wikipedia.org/wiki/Ontologia, https://en.wiktionary.org/wiki/alius#Latin), partem_orationis (https://en.wiktionary.org/wiki/pars_orationis#Latin) would need to be explained the relations from other scripts (aka "translated") (but terms like Ad-Hoc, CSV-3 actually would be the same on any ontologia, since they are content.

If some mappings are important enough (for example, the specifications related to Ad-Hoc or HXLTM-ASA (whch in other languages could be something different) since there is much less writing systems than languages, such aliases could be part each ontologia.

But anyway, the point here is shows that even the verbs of the ontologia itself are in Latin and this topic may be remembered much time in the future, are not hardcoded in Latin script.

fititnt added a commit that referenced this issue Nov 15, 2021
@fititnt fititnt added this to the v1.0.0 milestone Nov 15, 2021
fititnt added a commit that referenced this issue Nov 15, 2021
…ed instead of cor.hxltm.yml (not yet default); improved tests to search files not installed with package
fititnt added a commit that referenced this issue Nov 15, 2021
…ontologia/0.77.995.yml added (but not implemented)
fititnt added a commit that referenced this issue Nov 17, 2021
fititnt added a commit that referenced this issue Nov 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant