Use a build tool #959

fkleedorfer · 2024-08-20T15:11:32Z

Use a build tool?

Problem: All Issues brought up so far require or aim at some kind of build automation. There currently is none.

Why is that a problem: Anything that needs to be done manually will cause errors, bottlenecks and dependency on individuals

Cause: Most programming languages/frameworks come with a variety of build tools, and most projects use one. However, this is an ontology project, inherently independent from programming languages, and therefore, it is not obvious what should be used. That is probably the reason why none is in use.

Fix: Choose one build tool that the community can live with and refactor the project so it uses that tool. Bonus: github actions become easier to make and maintain because they might only need to run some build targets

So, question: What would be your criteria for choosing a build tool, and which one, if any, should it be?

Originally posted by @fkleedorfer in #942 (comment)

Edit: collecting requirements/ideas/aspects from the comments here (and my own)

This issue is not about adding new functionality, just about automating what is currently done manually or semi automatically

format source files consistently
run shacl checks
generate quantitykind/unit associations
make release zip
build in github action
trigger action for pr and new commits in main (pr merge/rebase)
github release action

Incomplete list of future functionality to be implemented in the build

inconsistent derivedCoherentUnitOfSystem, hasBaseUnit and conversionMultiplier #952 to check that all and only "fundamental units" have conversionMultiplier=1
[] generate factor units (contrib from qudtlib)

The text was updated successfully, but these errors were encountered:

fkleedorfer · 2024-08-20T15:13:02Z

I think this is the first thing we need if we are to get some automation going. I'll make a draft PR soonish.

VladimirAlexiev · 2024-08-28T08:00:02Z

hi @fkleedorfer ! Good idea, but could you elaborate a bit on what do you want to automate?
Let's gather a list of requirements here (cc @steveraysteveray @ralphtq).
Florian, can you undertake to collect requirements and put them in the issue description, or if you prefer in a separate file (guess that's what the PR you mentioned will be about?)

inconsistent derivedCoherentUnitOfSystem, hasBaseUnit and conversionMultiplier #952 to check that all and only "fundamental units" have conversionMultiplier=1
Steve, what was that complex relation between QuantityKinds and Units that you had a diagram for?
You must mean https://github.com/qudt/qudt-public-repo/wiki/Advanced-User-Guide#4-computing-applicable-units-for-a-quantitykind. The code to execute the algorithm is available at schema/extensions/FUNCTIONS_QUDT-v2.1.spin.ttl, but it uses the TopQuandrant SPIN library for MagicProperty.

fkleedorfer · 2024-08-28T08:57:01Z

Would like the work to be done in reasonable small chunks (because I dont have enormous amounts of time for it), so I'd like to first not add new functionality, just automate existing.

We are looking at a lot of things that can be added once the build automation is in place.

The first problem is choosing the build system itself. I did not get a lot of input on the question in the discussion, however, the current favorite is maven. That's what my PR will be about. At the moment I am looking at how to do TTL formatting in that setting. (Probably jena prettyprint but we'll see, there is also https://github.com/atextor/turtle-formatter ). Weirdly, no maven integration for either. (Sideglance spotless)

VladimirAlexiev · 2024-08-28T10:51:31Z

@fkleedorfer But is there a problem with the turtle formatting of QUDT? I think it comes from TQ, and I think it's just fine?

fkleedorfer · 2024-08-28T11:30:58Z

@fkleedorfer But is there a problem with the turtle formatting of QUDT?

(Accidentally deleted my post so I rewrite it here)
Yes: contributors cannot reproduce it. When you contribute triples, you'll add them wherever, and at some point steve pulls the code, reformats it and pushes it. Thats not a great workflow.

If formatting was part of the build, our life would be easier.

That is not to say that TQ formatting is bad. If we can use it in a build then mayb we should.

steveraysteveray · 2024-08-28T16:03:40Z

I think the serialization we use in TopBraid is fairly common - alphabetical by grouped subject - isn't it? I assume that same serialization is available via the TQ API if we use that for inferencing and validation in the build, although I haven't checked. I'm not sure what the PySHACL library does, but my understanding is that it is slower and not complete.

dr-shorthair · 2024-08-28T19:09:59Z

OWL-API is also common.

@ashleysommer @nicholascar can you comment on completeness of pySHACL?

dr-shorthair · 2024-08-28T19:14:55Z

Else go for RDF Canonicalization https://www.w3.org/TR/rdf-canon/
JS Implementation here: https://github.com/digitalbazaar/rdf-canonize
RDFlib here?: https://github.com/eyusupov/rdflib-canon

(is this in the TQ Suite?)

fkleedorfer · 2024-08-28T20:45:13Z

Canonicalization is relevant for consistent ordering of blank nodes across multiple serializations. That's the one thing most formatters will fail to do.

VladimirAlexiev · 2024-09-04T05:21:41Z

Don't most contributors submit relatively small PRs, typically new units, where they can follow the existing formatting even by hand?

In addition to the question of formatting, let's collect other needs for a build workflow. Like checking data consistency using SPARQL. see my two bullets above.

fkleedorfer · 2024-09-06T07:35:44Z

Like checking data consistency using SPARQL

Would you be ok wrapping the SPARQL queries in a SHACL shape or would you prefer another way, such as a folder with files containing sparql queries, and some convention for how their results should be interpreted?

steveraysteveray · 2024-09-06T12:43:44Z

I vote for a SHACL shape, since we already do other validations that way (not yet part of the build).

VladimirAlexiev · 2024-09-09T15:16:17Z

@steveraysteveray and @fkleedorfer

SHACL vs SPARQL:

If you look at inconsistent derivedCoherentUnitOfSystem, hasBaseUnit and conversionMultiplier #952, that's a global query
we can wrap it in SHACL as SPARQLTarget; then SPARQLConstrant should just format message, value etc
See establish ontology "hygiene" checks Interoperable-data/ERA-Ontology-3.1.0#82 for a similar but longer list of SPARQL checks

the serialization we use in TopBraid is fairly common - alphabetical by grouped subject

I like it. If classes and props follow naming conventions, then that sorts them in the proper order.
I'd just move individuals last: but most ontologies have terms or individuals, not both, so that's ok.

But I see Florian contributing to https://github.com/atextor/turtle-formatter:
Can you share impressions and should we use it instead of TQ TB?

fkleedorfer · 2024-09-09T18:55:02Z

But I see Florian contributing to https://github.com/atextor/turtle-formatter: Can you share impressions and should we use it instead of TQ TB?

My point would be that formatting should be accessible to any developer who wants to contribute. I don't think that will be the case with TopBraid. I was hoping to be able to do it with jena, but it's not so simple. turtle-formatter is a decent solution for us (if it works, which is what I'm working on).

As there is more to formatting your codebase than just formatting one file, I've prepared a contribution to spotless - a spotless RDF plugin, if you like, that will use whatever we manage on the file-formatting side (turtle-formatter for TTL, jena for everything else, or just not support anything else), to format the whole codebase. The spotless RDF plugin is more or less done, except for tests, and we'll need a published turtle-formatter jar with our changes.

EDIT: My impression of turtle-formatter is that its default output is ok, it is highly configurable, and the codebase is small and I'm confident we can contribute any formatting options that we need, for example, individuals last.

VladimirAlexiev · 2024-09-10T09:44:15Z

Also, the developer of turtle-formatter @atextor is actively engaged and responsive: a big plus
I think I'll use turtle-formatter for some large-scale electrical ontologies (CIM/CGMES)

dr-shorthair · 2024-09-10T23:49:32Z

@nicholascar is this the formatter you use?
(I think you'll had a standard turtle formatter to help with diffs)

VladimirAlexiev · 2024-09-17T06:28:07Z

add more sorting options in longturtle serializer RDFLib/rdflib#2880 is a request to add pretty-printing features to Python's rdflib
A relevant thread "Diff'ing RDF files" appeared on the [email protected] and [email protected] mailing lists in Sep 2024.
Elisa Kendall (one of the main FIBO ontologists):
There is an open-source tool available from the EDM Council for converting between RDF/XML, Turtle, and JSON-LD and for consistent serialization of any of these representations of RDF and OWL. The GitHub site for it is https://github.com/edmcouncil/rdf-toolkit. It is actively maintained, freely available, and addresses a number of issues mentioned on the thread, among other things. It also allows users to turn any of its features on/off as desired. It runs on the command line, or can be invoked automatically through GitHub commit hooks, for example.
For collaborative work across development teams for large ontology projects, consistent serialization for comparison purposes was one of our first and relatively important issues. It enables visual comparison in GitHub (and likely other source code management systems), so that anyone reviewing the changes can see exactly what changed, down to the single character level.
We also have a pipeline that looks for a myriad of issues in ontologies, performs regression testing using examples and reference data, and includes an html-based publication process that itself has a comparison feature, enabling comparison of any pull request or prior release with another version or with the latest version. The code for this is also open source, available from the EDM Council GitHub repository, though support is required for hosting and customization.

fkleedorfer · 2024-09-17T07:27:49Z

RDF Toolkit seems like a good tool, but it does not have the stable inline blank nodes feature I just put into turtle-formatter: edmcouncil/rdf-toolkit#49. The good thing is that now I know how to do it ;-) - but I don't know if I want to put in the time again.

However, I like their approach on formatting (git hook with the binary, all you need to do is install java and set JAVA_HOME). and I do like the end result of their pipeline: https://spec.edmcouncil.org/fibo/ontology/ @ralphtq @steveraysteveray @jhodgesatmb might want to see this as another possible direction to take the whole build/publication process

Not convincet at this point that all of this warrants a switch, but it's certainly worth thinking about it.

nicholascar · 2024-09-24T06:01:29Z

Hi all, pySHACL is under active development as we are using it every day in large projects lead by Ashley, the main developer of it. Also, please note that there is a SHACL WG proposed (some of you have contributed to this proposal already!) and one of the deliverables for the WG which I will likely lead is an "Inferencing Rules" spec that work to standardise how SHACL build rules may be ordered, bundled, selected etc.: https://w3c.github.io/shacl/charter-1.2/shacl-wg.html#deliverables Sure, there are lots of build rule systems out there but many are not widely used (e.g. RIF-CS) whereas SHACL is widely used for validation, some UI generation and some data building, as per TQ products. So let's extend and standardise that so all our data building rules can be just a bunch of RDF Shapes files, not weird other-language things or over-the-to reasoning envelopes. Nick

…

On Tuesday, 17 September 2024 at 17:28, Florian Kleedorfer ***@***.***> wrote: [RDF Toolkit](https://github.com/edmcouncil/rdf-toolkit) seems like a good tool, but it does not have the stable inline blank nodes feature I just put into turtle-formatter: [edmcouncil/rdf-toolkit#49](edmcouncil/rdf-toolkit#49). The good thing is that now I know how to do it ;-) - but I don't know if I want to put in the time again. However, I like their approach on formatting (git hook with the binary, all you need to do is install java and set JAVA_HOME). and I do like the end result of their pipeline: https://spec.edmcouncil.org/fibo/ontology/ ***@***.***(https://github.com/ralphtq) ***@***.***(https://github.com/steveraysteveray) ***@***.***(https://github.com/jhodgesatmb) might want to see this as another possible direction to take the whole build/publication process Not convincet at this point that all of this warrants a switch, but it's certainly worth thinking about it. — Reply to this email directly, [view it on GitHub](#959 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/ABX3SED6O3SPKEDKUL7ANP3ZW7KYZAVCNFSM6AAAAABM2D6GUCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJUG42TENRVGQ). You are receiving this because you were mentioned.Message ID: ***@***.***>

fkleedorfer · 2024-09-24T08:56:37Z

Well, FWIW, the shacl-maven-plugin was just released, which supports validation and inferencing.

@nicholascar thanks for the pointer to those standardization efforts. Very much looking forward to theresults. Hopefully, not too far off SHACL-AF Rules.

fkleedorfer · 2024-09-24T15:42:00Z

PR #975 addresses this issue

VladimirAlexiev mentioned this issue Sep 9, 2024

Serialization order with blank nodes is non-deterministic atextor/turtle-formatter#8

Closed

fkleedorfer mentioned this issue Sep 24, 2024

Draft: Introduce a build process based on maven #975

Closed

VladimirAlexiev mentioned this issue Oct 3, 2024

add initial structure of README w3c-cg/awesome-semantic-shapes#1

Closed

fkleedorfer mentioned this issue Oct 28, 2024

Introduce an automated build process #989

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a build tool #959

Use a build tool #959

fkleedorfer commented Aug 20, 2024 •

edited

Loading

fkleedorfer commented Aug 20, 2024

VladimirAlexiev commented Aug 28, 2024 •

edited by steveraysteveray

Loading

fkleedorfer commented Aug 28, 2024

VladimirAlexiev commented Aug 28, 2024

fkleedorfer commented Aug 28, 2024 •

edited

Loading

steveraysteveray commented Aug 28, 2024

dr-shorthair commented Aug 28, 2024

dr-shorthair commented Aug 28, 2024 •

edited

Loading

fkleedorfer commented Aug 28, 2024

VladimirAlexiev commented Sep 4, 2024

fkleedorfer commented Sep 6, 2024

steveraysteveray commented Sep 6, 2024

VladimirAlexiev commented Sep 9, 2024 •

edited by fkleedorfer

Loading

fkleedorfer commented Sep 9, 2024 •

edited

Loading

VladimirAlexiev commented Sep 10, 2024

dr-shorthair commented Sep 10, 2024

VladimirAlexiev commented Sep 17, 2024

fkleedorfer commented Sep 17, 2024

nicholascar commented Sep 24, 2024 via email

fkleedorfer commented Sep 24, 2024

fkleedorfer commented Sep 24, 2024

Use a build tool #959

Use a build tool #959

Comments

fkleedorfer commented Aug 20, 2024 • edited Loading

Use a build tool?

fkleedorfer commented Aug 20, 2024

VladimirAlexiev commented Aug 28, 2024 • edited by steveraysteveray Loading

fkleedorfer commented Aug 28, 2024

VladimirAlexiev commented Aug 28, 2024

fkleedorfer commented Aug 28, 2024 • edited Loading

steveraysteveray commented Aug 28, 2024

dr-shorthair commented Aug 28, 2024

dr-shorthair commented Aug 28, 2024 • edited Loading

fkleedorfer commented Aug 28, 2024

VladimirAlexiev commented Sep 4, 2024

fkleedorfer commented Sep 6, 2024

steveraysteveray commented Sep 6, 2024

VladimirAlexiev commented Sep 9, 2024 • edited by fkleedorfer Loading

fkleedorfer commented Sep 9, 2024 • edited Loading

VladimirAlexiev commented Sep 10, 2024

dr-shorthair commented Sep 10, 2024

VladimirAlexiev commented Sep 17, 2024

fkleedorfer commented Sep 17, 2024

nicholascar commented Sep 24, 2024 via email

fkleedorfer commented Sep 24, 2024

fkleedorfer commented Sep 24, 2024

fkleedorfer commented Aug 20, 2024 •

edited

Loading

VladimirAlexiev commented Aug 28, 2024 •

edited by steveraysteveray

Loading

fkleedorfer commented Aug 28, 2024 •

edited

Loading

dr-shorthair commented Aug 28, 2024 •

edited

Loading

VladimirAlexiev commented Sep 9, 2024 •

edited by fkleedorfer

Loading

fkleedorfer commented Sep 9, 2024 •

edited

Loading