Skip to content

Releases: EticaAI/HXL-Data-Science-file-formats

v0.8.6

11 Apr 23:24
Compare
Choose a tag to compare

A LOT has been done since the last release. But this will be another point on time version that, again, for this moment will be just a list of the commits. Old documented features still work and maybe the older conservative approach would be what becomes implementer instead of what will be commented here.

I will comment about something drafted called HDPLisp.


1. About HDPLisp (temporary name)

I think this feature alone may take at least 150 hours to have a general idea of if this doing such an approach could work. What really push to the limit (to try usability even on the auxiliary script language to HDP) was the following:

something safer enough to (as extreme, but realistic example at this point from what I'm seeing in the field) be able to allow be used by who work on very frontline to replace (Excel/Google Spreadsheet) formulas on contexts like even ones used decide people who would not have access to Intensive Care Units.

I avoid talking about this type of thing, but I'm acknowledging that theoretical decisions from last year based on ethicists and medical associations on CPLP (including how other countries would deal with lack of ICUs to everyone) are challenging. Software broke, like overflow beyond 100% because underlying data is bad (and data is bad because it reflects really). Even if HDP was ready years ago, with a level of automation we here would abstract (and already in local languages) either medical personnel would be able to fix things following these ethical codes or people with programming skills would need to fix the IF-ELSEs even when who make the requests make mistakes.

1.1 Hypothesis: something that (if necessary) be as strict as Ada, yet as feasible to implement compilers, like Lisp

Of all programming languages I researched for inspiration, Ada is the closest one that seems to try to solve issues that HDP would need extensions for extreme cases were mistakes cost lives. Ada use of English and (in special consideration alternatives at time was created) the way to reduce ambiguity (it only has + - / * as mathematical operators) is an approach to try to be more explicit (so, even if we make something compatible with Lisp/Scheme/Clojure, the real working version could be very verbose, to a point of people could write a draft in middle of an in person reunion). Also the "Programming by contract" / Design by contract (DbC), is a new feature and ships with core version of Ada 2012, is something that could be very, very pertinent to use cases of HDPLisp, to a point this alone could take more time on next weeks.

The decision to try to be more close to a Lisp is because, is the closest to a Turing-complete alternative feasible to implement as soon as possible focusing on being production-ready starting from scratch. This does not means in 150-300 hours may not just be more restricted on how extensions to HDP may be (like put the conditions on the HDP files themselves, just a different file extension) but all other typical programming languages architectural approaches even if we manage to make something work in a single natural language, how the verbs could be translated? And did anyone ever planned a fully translatable turing-complete language that the parser is able to deal with both Left-to-right (like Latin) or Right-To-Left (like Arab script)?

What I mean is that the decision about a more S-expression as general approach to the auxiliary language of HDP takes less time to implement a text that can be "converted to commands" (Abstract syntax tree) (even if would not be for full support for several natural languages languages). This time starts to get very pertinent if have to port what would be HDPLisp to more than one hosting language (think not only python, but full support with JavaScript, so both browser and some some desktop application could be done). Just to give an idea: the bare minimum syntax of Lisp-like sintaxes are know to be able to run on very restricted hardware; which means that any subset of what would be HDPLisp (like the ones that would not be used for abstract complex things that may need more frequent update, like access to UN P-codes) could be ported to specialized, cheap to build (yet feasible to be audited) hardware that could not even need to have an operational system.


2. git log --oneline -172

bd28da2 v0.8.6 started; (we may use hxlm-js (#16) + hdplisp (#18) even on python port to do heavy processing of HDPLisp until API be stable
2f60059 hdplisp (#18): added several examples from other lisps (and non-lisps, like Ada and Haskell)
1cb8de3 hxlm-js (#16), hdplisp (#18): I think lisp/norvig-lispy.mjs (based on Peter v1) will not be engouth; I think we could refactor all the things and already plan ahead 'programming by contract' already on the prototypes of functions when user create them
e312774 hxlm-js (#16), hdplisp (#18): LISP-1 or LISP-2? That's a good question
e610e0d hxlm-js (#16), hdplisp (#18): HDLbLispMachinamSimulatum draft
75f426f hxlm-js (#16), hdplisp (#18): hdplisp-editor-ltr.html testing xtermjs
2747d33 hxlm-js (#16), hdplisp (#18): hdplisp-editor-ltr.html draft
4007e91 hxlm-js (#16), hdplisp (#18): NodeJS REPL ; TODO: we need safer ways to do it
49d2e2f hxlm-js (#16), hdplisp (#18): HDPbMiniman.machinam_simulatum() draft
cd4c930 hxlm-js (#18), hdplisp (#18): testing extension .hdpl instead of .hdpl.lisp; Uses github/linguist
306b076 hxlm-js (#18), hdplisp (#18): learned identicum? non-identicum?
cbd6c03 hxlm-js (#18), hdplisp (#18): lat->multiplicationem, lat->divisionem
5f3c999 hxlm-js (#18), hdplisp (#18): Improved naming references of the boostrapper
52fc4d4 hxlm-js (#18), hdplisp (#18): GREAT! Now + - work as proof of conceptgit gui!
9721bfb hxlm-js (#18): HDPbLisp.evaluate() draft started
a452486 hxlm-js (lisp #18): hdpl-conventions; more more context; some formating
1b03847 hxlm-js (lisp #18): hdpl-conventions; more details about Internationalized auditability core feature
b9900d9 hxlm-js (lisp #18): hdpl-conventions draft
253d631 hxlm-js (lisp #18): moved code based on Peter Norvig work to dedicated file; also some refactoring
24ad442 hxlm-js (lisp #18): directory reorganization
c15b51c hxlm-js (lisp #18): tests with Right-To-Left Mark #15
d0e2063 hxlm-js (lisp #18): HDPbLisp.{ast,ast_ltr,ast_rtl}() started; Right-to-left? (like Imperial Aramaic, not Lingua Latina) will need some serious work on abstract syntax tree
1270c51 hxlm-js (lisp #18): hdpb-lisp.mjs, ok, NOW at least the reader create an LTR Abstract Syntax Tree
ce20548 hxlm-js (lisp #18): hdpb-lisp.mjs, working on parse_recursive_ltr()...
0c5eaca hxlm-js (lisp #18): hdpb-lisp.mjs less loops, more recursion...
4955974 hxlm-js (lisp #18): قلب.apc.hdpl.lisp added
1ecbfde hxlm-js (lisp #18): hdpb-lisp.mjs loops (not yet a lot, but loops; still need to convert to recursion or something)
b50ec4d hxlm-js (lisp #18): ontologia/core.hdplisp.yml draft
0634d31 hxlm-js (hdp #18, lisp #18): prefix HDPb instead of HDP; see https://github.com/EticaAI/HXL-Data-Science-file-formats/issues/18#issuecomment-813734342
ccd131c hxlm-js (hdp #18, lisp #18): HDPL10n & HDPi18n
73ac547 hxlm-js (hdp #18, lisp #18): starting going JavaScript modules all the way (but no node_modules/, not now, let's use native things)
df4b7ee hdp-spec (#16): draft of  hdp-conventions/security-considerations.md
a64c37c hdp-integrity (#17): improved prepare-hxlm-relsease.sh (#12) to generate Subresource Integrity (still not fully automated)
f47b97f v0.8.5 started; hxlm-js (hdp #18, lisp #18), the javascript version _starts_ to undestand HDP; part of the code inspired on https://tags.etica.ai
9e20848 hxlm-js (hdp #18, lisp #18) interface start to be usable for demonstrations
8368fe0 hxlm-js (hdp #18, lisp #18) fixed JS promise bug
7bbef57 hxlm-js (hdp #18, lisp #18) tests with gpg-sign (#12); fixed usage of page-signer.js (needs '%%%SIGNED_PAGES_PGP_SIGNATURE%%%' on the source page); also changed order of prepare-hxlm-relsease.sh commands
01af024 hxlm-js (hdp #18, lisp #18) tests with gpg-sign (#12); tests using @tasn webext-signed-pages page-signer.js utility
e8886a4 hxlm-js (hdp #18, lisp #18): index.html also part of the hashes
e05f84a hxlm-js (hdp #18, lisp #18): HDPAux tests
68de75b hxlm-js (hdp #18, lisp #18): Javascript async, await, etc, etc, etc
a644c08 hxlm-js (hdp #18, lisp #18): HDPAux started
4fcbeab #12 prepare-hxlm-relsease.sh created
2d9f678 hxlm-js (hdp #18, lisp #18): HDPMiniman.bootstrapping()
6d4f5a3 hxlm-js (hdp #18, lisp #18): hxlm-js; Great! Definitely modern javascript is much beautiful (and not as complicated to convert from python)
04777d8 hxlm-js (hdp #18, lisp #18): started hxlm-js/; transpile Python to JS is way too ugly to not do by hand
8a208e5 hdp.etica.ai as domain for testing JavaScript early prototypes; also {{ escape with \ as quick fix to allow generate GitHub pages
f1c13e6 hxlm.ontologia.json files generated from hxlm.ontologia yml files; draft of hxlm.core.localization.hgettext
28e5abb hdp-spec (#16): copy draft of hxlm_factum_to_sexpr back to hdp.util.debug
27d30bb hxlm_minimam.lang: hxlm_factum_to_sexpr(), _s(), and draft of _sv() & hxlm_sespr_to_object()
ceddfa6 WOW! Ok. Nice to know. hxlm_minimam.lang draft
db34830 Code refactoring: hxlm.ontologia.python.{commune,systema} created from hxlm.core
7ce4f0d hdp-spec (#16): AbstAttr, AbstAux & AbstRadix created
37f0348 tests/transpile-python-to-javascript.sh
a804249 Code refactoring: hxlm.core.hdp.data -> hxlm.ontologia.python; the idea here is allow even non-python programmers have an idea of how internals of HXLm submodules works
21d72ee Code refactoring: hxlm.ontology -> hxlm.ontolo...
Read more

v0.7.5

16 Mar 18:55
Compare
Choose a tag to compare

The v0.7.5 is a huge update over the last 3 weeks!!!

So many changes (even for my personal way of doing things) that I will mention the log messages. Note that in addition to the new chances themselves, I'm actually am (or at least wasn't) as proficient with Python language than my experience with PHP, JavaScript and IoC (e.g. Ansible, YAML). Anyway, part of the old code (like the ones on /bin folder at the top) are still there, so old comments should still work.

The release v0.7.5 is mostly to have a point in time way to reference changes. I'm not sure if we will reach an v1.0.0 or after v0.9 will go to v0.10.0, v0.11.0, v.0.12.0, etc. Also, as much as possible we're already trying to make the tools to extend functionality be self-explanatory (aka even your code editor explain how to extend, not just the documentation!) and also implement integration tests to allow create/update features with more confidence.

git log --oneline -153

430ffcb hxlm (#11): initial draft of hdpcli --objectivum-linguam NNN
40f9d79 hxlm (#11): re-enabled hdpcli --non-urn & hdpcli --verum-urn
6859bd8 hxlm (#11): re-enabled hdpcli --non-grupum & hdpcli --verum-grupum
7fa8f27 hxlm (#11): re-enabled hdpcli to use remote files as HDP, e.g. hdpcli http://example.org/path/my.hdp.yml
0efd3ac indigenous-language-brazil-sample.hxl.csv (test data) added; Source: all languages with Iso 639-3 Code from https://pt.wikipedia.org/wiki/L%C3%ADnguas_ind%C3%ADgenas_do_Brasil
abe2fd2 hxlm (#11): drafted cli commands to hdpcli; --non-adm0, --non-grupum, --non-nomen, --non-tag, --non-urn, --verum-adm0, --verum-grupum, --verum-tag, --verum-urn
1a41d14 i18n+l10n (#15): core_vocab, added attr.verum & attr.falsum; this is not as intuitive. In special attr.falsum.zho (because may have many, many alternatives) and attr.falsum.ara (because I'm not sure about the attr.falsum.ara.id) is welcome to get revision; the initial values at least are from Wikidata/Wikitionary.
78160bd i18n+l10n (#15): core_vocab, added attr.grupum; One way to group colletions (hsilos). While the tags allow use at any level, this attribute should be explicity named at top level.
f4f6a04 hxlm (#11): implicitly create an urn-like hash key for hsilos without explicitly use; something like 'urn:oo:hsilo:domain_base:container_base-container_item_index' were container_base ofte would be 'local' (localhost) and container_base often the filename itself
e228e24 hxlm (#11): force output Unicode YAML; if a single 'Olá Mundo\!' already was ugly, it would be much worse with anything beyond ASCII; so lets force it by default for everyone
27893a0 hxlm (#11): added  HDP._prepare_from_local_directory(); draft of HDP._update()
ed6dcc8 hxl-processing-specs (#14): added tests
2550222 hxl-processing-specs (#14): now is possible to specify inline data ('input_data', 'output_data', as part of hrecipe.exemplum; the underlining inplementation still not ready, but the idea is be able to specify self-contained example when creating recipes with YAML; the hrecipe.exemplum[N]objectivum.datum can be used for self-contained testing!
57d2cca i18n+l10n (#15), hxl-processing-specs (#14): now is possible to specify more than one source; also content without translation will be prefixed with a single _
4819547 i18n+l10n (#15): to core_vocab added objectivum; renamed attr.lingua -> attr.linguam
bfcd583 core_vocab Added attr.locum & attr.tempus
abce05b core_vocab Added attr.datum; renamed: desc -> descriptionem, lang -> lingua, name -> nomen, source -> fontem
7c367b3 core_vocab attr.exemplum added (see https://github.com/EticaAI/HXL-Data-Science-file-formats/issues/14#issuecomment-798939113)
beba373 hxlm.core.internal.formatter.beautify() created; already deal if output is piped to another command, like jq or hxlspec, without breaking
271385f Coloring all the things works. But how to know if is not stdout, but being piped? If piped, things broke hard with jq or hxlspec
07a30fa Testing this thing called pygments. Colors are nice. Let's try color all the things and see if don't break what already was working
47f8c87 v0.7.5 started
de9467a hxl-processing-specs (#14): HDP.export_json_processing_specs() works
57de6da hxlm(#11), hxl-processing-specs (#14): HDP._prepare_from_remote_iri() works
0dc88ee hxlm(#11), hxl-processing-specs (#14): HDP._prepare_from_local_file() now works
cc5462f hxlm(#11), hxl-processing-specs (#14): Added HDP.HDP_JSON_EXTENSIONS & HDP.HDP_YML_EXTENSIONS
10c7df8 hxlm(#11), hxl-processing-specs (#14): Added HDP._online_unrestricted_init, HDP._safer_zone_hosts, HDP._safer_zone_list, HDP.export_schema_json(), HDP.export_yml()
e516c56 hxlm(#11), hxl-processing-specs (#14): created hxlm.core.model.hdp HDP class
819196d hxl-processing-specs (#14): hdpcli --export-to-hxl-json-processing-spec added (draft)
314c0e3 hdp.json-schema.json: <3 it's working
e344f79 hdp.json-schema.json: ok, now we starting to have something; still broken, but at least I know where is broken
2559c6a schema/hdp.json-schema.json I think is better to restrict to json schema draft-07 (very recent one may not be supported
181274e hdp.json-schema.json: ok, re-read the current draft-bhutton-json-schema-00 spec; I think we actually can have subschemas
68e9fd3 schema/hdp.json-schema.json started
544e12e recipe-test-01.yml started
81db9aa HSToken draft (from internal.keystore)
a57f77d hdpcli, hxlm (#11): HKeystore created; proof of concept of how to parse encrypted URNs (urnresolver #13) may require some way generic adapter
6317513 hxlm (#11): _get_keypar() started
fe6cb62 hxlm (#11): HDataDispatch; flight dispatcher, but for data
da44daf hxlm, hdpcli (#11): Started draft of HWorkspace; lots of reading about which crypto high level library to use (and also about care about developers usability
b74dc68 hdpcli (#11): we will definely will need keyring, in special for who would not have hardware smartcards
d8992f6 hdpcli (#11): MVP of _get_salt() & _get_fernet(); created _entropy_pseudotest()
1840c6e hdpcli (#11): drafted _get_salt() & _get_fernet()
8816aad hdpcli (#11): _exec_hdp_init_data() works
62e4305 hdpcli (#11): created prompt_confirmation() and a bunch of if elses
3025360 hdpcli (#11): drafted cli --hdp-init, --hdp-init-home & --hdp-init-data
44e0191 hdpcli (#11): started, based on hxl2example
0edcecf hxlm (#11), urnresolver (#13): python cryptography lib is now an requeriment (need when dealing with encryted shared list of URN)
1dcb6d4 v0.7.4 started
a8fbfc2 urnresolver (#13): urn-data-specification exclusive folder started
881a96d hxlm (#11), urnresolver (#13): --no-urn-user-defaults & --no-urn-vendor-defaults CLI options added
36c80ef hxlm (#11), urnresolver (#13): Why use URN to identify resources is more than naming convention
3379f0a hxlm (#11), urnresolver (#13): documentation of urnresolver current state
2e6d30c hxlm (#11), urnresolver (#13): initialize with some defaults (if user did not customized yet), file urnresolver-default.urn.yml
e9a6d67 hxlm (#11), urnresolver (#13): 22 commits/hours later, now we start to have something testable
33eb104 hxlm (#11), urnresolver (#13): drafted concept of 'urnref' (something like when have several sources or URNs, allow urnresolver filter sources (at first just use file names)
041420e hxlm (#11), urnresolver (#13): added loader for YAML & JSON; for TravisCI, removed hardcoded path; renamed urn:data:xz:hxl:std:core:hashtag (std inspired on ISO RFC) to urn:data:xz:hxl:standard:core:hashtag (maybe core is not need?)
9aa8010 hxlm (#11), urnresolver (#13): now also with TSV files!
9d468ea hxlm (#11), urnresolver (#13): drafted HXLM_DATA_URN_EXTENSIONS_ENCRYPTED
926019b hxlm (#11), urnresolver (#13): working on get_urn_resolver_local(), file order is important
77895bc hxlm (#11), urnresolver (#13): created get_urn_resolver_local() and draft of get_urn_resolver_remote() & get_urn_resolver_remote_authenticated()
045fd2c urnresolver (#13): added .well-known proof of concept test files; renaming files to be just urn.csv,urn.json,urn.yml (this is more an suffix, since users could in fact search for entire paths)
aed3208 urnresolver (#13): Added URNResolver test files
ad3a6a0 hxlm (#11), urnresolver (#13): clean up
ba9f034 hxlm (#11), urnresolver (#13): fixed issue with non-ASCII internationalized organization names
a5e76a6 hxlm (#11), urnresolver (#13): for DataUrnHtype() testing using even domain names as quick namespace; full unicode support need more testing (fallbacking to GenericUrnHtype instead of DataUrnHtype)
b21beac hxlm (#11), urnresolver (#13): for DataUrnHtype() some baseline parser before need to resort to full grammar checking; this at least could help locallized namespaces get what matter for then, while keeping the start of the 'urn:data' predictable
99fceae hxlm (#11), urnresolver (#13): for DataUrnHtype(); so many options to buld full parser to avoid regex hell; I think maybe the early versions could be a bunch of effective if-elses
7102e28 hxlm (#11), urnresolver (#13): for DataUrnHtype(), reading about create a formal ABNF (like the ISO URN, RFC5141); but ANTLR seems more friendly
a8306ad hxlm (#11), urnresolver (#13): drafted HXLM_CONFIG_BASE, ..POLICY.. & ..VALT..
d7cc2b3 hxlm (#11), urnresolver (#13): started hxlm.core.schema.urn.util
0660705 hxlm (#11), urnresolver (#13): started DataUrnHtype() [urn:data:, urn:data-i, urn:data-p]
8468d30 hxlm (#11), urnresolver (#13): code clean up; draft of documentation
17900d4 hxlm (#11), urnresolver (#13): './.tox/py38/bin/urnresolver --help' at least dont return error;
7b6502f hxlm (#11), urnresolver (#13): moving urnresolver as entry_points instead of script (more strict control, force to be python script, not generic system scrypt)
56bfc7b hxlm (#11), urnresolver (#13): started urnresolver based on hxl2example
5f81dd8 hxlm (#11): improved IetfUrnHtype() works [for 'urn:ietf:rfc' prefixes']
b21d6c8 hxlm (#11): improved IetfUrnHtype(...
Read more

v0.7.0

23 Feb 07:18
Compare
Choose a tag to compare

The release v0.7.0 does not add practical functionality, but is an initial refactoring of the HXLMeta (#9).

The main eventual objective is be able to not only break the internal functionalities (that, in practice, would means allow external developers be able to "import HXL Datatypes", think like importing C .h headers) but also try to see some way to after have functional MVPs with new approaches, be possible to add more specialized python data classes that would have higher priority over the internal ones. (Some features like data classes require relatively new Python versions, so even if this project manager to hit v1.0 already in next months, this may affect others with stricter older python versions support)

Comparison with popular libraries like numpy/pandas and so called "Dypes"

At the moment I think can use something like a suffix called "HType". While definitely part of the problem is convert data to most efficient storage types, the bigger reason to break in several classes and try to be reusable is draft abstract concepts related do "data points" (from an individual cell value on an spreadsheet, to a row/"observation"/"record" or a column/"variable" and finally to the dataset itself).

While both drafted Sensitive/Encryption concepts on Python are not documented on the spreadsheet like all others, the fact of start to have an abstract representation of the dataset could allow to (when actually do export/import) be able of use as reference to decide things like encrypt individual columns or some specific rows with even more than one decryption key. So this type of very peculiar scenario may explain "HType" differences over "Dtypes".

Public reference tables

Note: The public reference table do not have yet drafted concepts like the one about Sensitive and Encryption topic. Is possible that some parts of the implementation may not be as documented as others. Also, if eventually implementations like client-side encryption become used by more than early adopters or outsite the @HXL-CPLP community, is very pertinent that the part related to encryption have some revision, so instead of have 'implementations on several programming languages' this would be very pertinent be reviewed.

v0.6.7: say hello to `hxlquickmeta` v1.0.0!

20 Feb 04:52
Compare
Choose a tag to compare

The v0.6.7 main change is hxlquickmeta: the tool now is both able to give a quick overview of local and remote datasets (even not already HXLated). Different from hxl2example/hxl2tab, the ad-hoc API usage via HTTP is not usable (needs handle better when and Dataset still not parseable by libhxl).

Note 1: the hxlquickmeta, in addition to be able to debug datasets, is used to debug the drafted taxonomies that could, just by HXL hashtags, make inferences on type of data, how could be imported on databases, conversion to data mining tools, etc. So the hxlquickmeta, as is now, is used to help construct these translations. Tha'ts why its meant to also be used even for remote datasets. and eventually could brute force the data types.

Note 2: the actual implementation on hxlquickmeta of HXLMeta class (and the new HXLMetaType) (see hxlquickmeta` (cli tool) + HXLMeta (Usable Class) #9) still not implement the features documented on EticaAI-Data_HXL-Data-Science-file-formats.

Minimum usable tools

  • hxl2example v2021-02-17: create your own exporter/importer
  • hxl2tab v1.6: tab format, focused for compatibility with Orange Data Mining
  • hxlquickimport v1.1: (like the hxltag), internal usage
  • hxlquickimporttab v1.0 undocumented
  • hxlquickmeta v1.0.0: output information about local/remote datasets (even non HXLated yet)

Unreleased tools

  • hxl2arff v1.0-draft
  • hxl2pandas v1.0-draft

Public reference tables

v0.6.5 - First versioned release

17 Feb 04:55
Compare
Choose a tag to compare

This is the first versioned release of HXL-Data-Science-file-formats.

While users are recommended to use latest version, this can be used to have some reference of changes that may affect external documentations. This may be the case with the hxl2tab v1.6:, since the current version use one simple conversion table to understand the input format, while an dedicated class, HXLMeta v0.6.5 still an draft, but may eventually be used.

Minimum usable tools

  • hxl2example v2021-02-17: create your own exporter/importer
  • hxl2tab v1.6: tab format, focused for compatibility with Orange Data Mining
  • hxlquickimport v1.1: (like the hxltag), internal usage
  • hxlquickimporttab v1.0 undocumented

Unreleased tools

  • hxl2arff v1.0-draft
  • hxl2pandas v1.0-draft
  • hxlquickimport v1.0-draft
  • hxlquickmeta v0.6.5

Public reference tables