Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add rdf convert #63

Merged
merged 4 commits into from
Mar 2, 2024
Merged

Conversation

florian23
Copy link
Contributor

  • cli command "convert" converts a given data contract into a RDF representation
  • add rdflib 7.0.0 dependency

I choose to implement a new cli function convert over extending the export function because the goal was to get an RDF representation of a given contract rather than creating a schema representation of the data model. Further exploration into the RDF topic might involve adding export functionality using the RDB2RDF mapping (see https://www.w3.org/2001/sw/wiki/RDB2RDF).

The convert function maps a data contract and its properties into the concepts DataContract, Server, Model, Field, Contact, Info, Terms and Example.

For example the following data contract

dataContractSpecification: 0.9.2
id: orders-unit-test
info:
  title: Orders Unit Test
  version: 1.0.0

Is mapped to the following RDF triple

@prefix dc1: <https://datacontract.com/DataContractSpecification/0.9.2/> .

<orders-unit-test> a dc1:DataContract ;
    dc1:dataContractSpecification "0.9.2" ;
    dc1:id "orders-unit-test" ;
    dc1:info [ a dc1:Info ;
            dc1:title "Orders Unit Test" ;
            dc1:version "1.0.0" ] .

Each model of a data contract is mapped to a single Model instance.

models:
  orders:
    description: The orders model
  line_items:
    description: The line items model

is mapped into

@prefix dc1: <https://datacontract.com/DataContractSpecification/0.9.2/> .

<orders> a dc1:Model ;
    dc1:description "The orders model" .

<line_items> a dc1:Model ;
    dc1:description "The line items model" .

- cli command "convert" converts a given data contract into a RDF representation
- add rdflib 7.0.0 dependency

I choose to implement a new cli function convert over extending the export function because the goal was to get an RDF representation of a given contract rather than creating a schema representation of the data model. Further exploration into the RDF topic might involve adding export functionality using the RDB2RDF mapping (see https://www.w3.org/2001/sw/wiki/RDB2RDF).

The convert function maps a data contract and its properties into the concepts DataContract, Server, Model, Field, Contact, Info, Terms and Example.

For example the following data contract

---------------------------------
dataContractSpecification: 0.9.2
id: orders-unit-test
info:
  title: Orders Unit Test
  version: 1.0.0
---------------------------------

Is mapped to the following RDF triple

---------------------------------
@Prefix dc1: <https://datacontract.com/DataContractSpecification/0.9.2/> .

<orders-unit-test> a dc1:DataContract ;
    dc1:dataContractSpecification "0.9.2" ;
    dc1:id "orders-unit-test" ;
    dc1:info [ a dc1:Info ;
            dc1:title "Orders Unit Test" ;
            dc1:version "1.0.0" ] ;
---------------------------------

Each model of a data contract is mapped to a single Model instance.

--------------------------------
models:
  orders:
    description: The orders model
  line_items:
    description: The line items model
--------------------------------

is mapped into

--------------------------------
@Prefix dc1: <https://datacontract.com/DataContractSpecification/0.9.2/> .

<orders> a dc1:Model ;
    dc1:description "The orders model" ;

<line_items> a dc1:Model ;
    dc1:description "The line items model" ;
--------------------------------
@florian23 florian23 mentioned this pull request Feb 26, 2024
- error handling was using the wrong parameter
@jochenchrist
Copy link
Contributor

This is really a great contribution. Thanks, @florian23!

The only thing that I need to think about, if there is really the need for a convert method.
I think it also fits into export, and it would also make sense to have an import later as well. How would you define the import-way with convert?

So, my vote would be to stik with export --format rdf for now.

@simonharrer
Copy link
Contributor

Awesome!

My comments:

  • We do have export --format odcs which basically acts as the convert function. This would be in line with this.
  • I'd be curious to know how this will be used further on. Perhaps we should document potential uses of this export as well. Could you add something to the export description in the README?

CHANGELOG.md Outdated Show resolved Hide resolved
- remove convert function
- add rdf export function
- add optional parameter rdf_base to export function
- fix pipeline error missing 1 required posiutional argument base
- add description for RDF export in README.md
- add potential use case descriptions for RDF export in README.md
- rename test_convert_rdf.py to test_export_rdf.py
@florian23
Copy link
Contributor Author

Hey,

Thats fine with me. I removed the convert function and put the to_rdf function under export. I dont really like how I handle the optional property rdf_base. I think it should be added because we can not guarantee, that the entities we create with the contract can be resolved to an absolute IRI which is required by RDF as far as I know about it. I think this needs a rework in the future. We could skip the property and let the user decide if they want to add the prefix to the entities. Adding the base property in the output after the export in a post processing step is also an option for a user.

I have added some potential use cases in the README.md At the moment I have:

  • Interoperability with other data contract specification formats
  • Store data contracts inside a knowledge graph
  • Enhance a semantic search to find and retrieve data contracts
  • Linking model elements to already established ontologies and knowledge
  • Using full power of OWL to reason about the graph structure of data contracts
  • Apply graph algorithms on multiple data contracts (Find similar data contracts, find "gatekeeper"
    data products, find the true domain owner of a field attribute)

# Conflicts:
#	README.md
#	datacontract/cli.py
#	datacontract/data_contract.py
@jochenchrist
Copy link
Contributor

Thanks for your contribution!
Happy to merge :)

@jochenchrist jochenchrist merged commit 2ea617b into datacontract:main Mar 2, 2024
0 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants