Maintain version and dependency info in RDF ontologies.
To install the most recent released version of the toolkit use pip install onto-tool
.
On newer versions of Linux it may fail to install with a message like "error: externally-managed-environment". It is now recommended to use the pipx command to install python packages local to the user.
sudo apt install pipx
pipx install onto_tool
Then update your PATH environment variable to include ~/.local/bin
. For the bash shell, add this to the end of your ~/.bashrc file.
export PATH="$PATH:~/.local/bin"
For additional information see https://peps.python.org/pep-0668/ and https://stackoverflow.com/a/75722775.
To experiment with unreleased features currently in development, clone this repo and navigate to the installed directory. Run python -m setup install
, which
will install the onto_tool
command and all its dependencies into your environment.
$ onto_tool -h
usage: onto_tool [-h] [-k] [-v] {update,export,bundle,graphic} ...
Ontology toolkit.
positional arguments:
{update,export,bundle,graphic}
sub-command help
update Update versions and dependencies
export Export ontology
bundle Bundle ontology for release
graphic Create PNG graphic and dot file from OWL files or SPARQL Endpoint
optional arguments:
-h, --help show this help message and exit
-k, --insecure Allow insecure server connections when using SSL
-v, --version Report onto-tool version and exit
The update
sub-command modifies ontology version and dependency information
$ onto_tool update -h
usage: onto_tool update [-h] [-f {xml,turtle,nt} | -i] [--debug] [-o OUTPUT]
[-b [{all,strict}]] [--retain-definedBy]
[--versioned-definedBy] [-v SET_VERSION]
[--version-info [VERSION_INFO]]
[-d DEPENDENCY VERSION]
[ontology [ontology ...]]
positional arguments:
ontology Ontology file or directory containing OWL files
optional arguments:
-h, --help show this help message and exit
-f {xml,turtle,nt}, --format {xml,turtle,nt}
Output format
-i, --in-place Overwrite each input file with update, preserving
format
--debug Emit verbose debug output
-o OUTPUT, --output OUTPUT
Path to output file. Will be ignored if --in-place is
specified.
-b [{all,strict}], --defined-by [{all,strict}]
Add rdfs:isDefinedBy to every resource defined. If the
(default) "strict" argument is provided, only
owl:Class, owl:ObjectProperty, owl:DatatypeProperty,
owl:AnnotationProperty and owl:Thing entities will be
annotated. If "all" is provided, every entity that has
any properties other than rdf:type will be annotated.
Will override any existing rdfs:isDefinedBy
annotations on the affected entities unless --retain-
definedBy is specified.
-v SET_VERSION, --set-version SET_VERSION
Set the version of the defined ontology
--version-info [VERSION_INFO]
Adjust versionInfo, defaults to "Version X.x.x"
-d DEPENDENCY VERSION, --dependency-version DEPENDENCY VERSION
Update the import of DEPENDENCY to VERSION
The export
sub-command will transform the ontology into the desired format, and remove version information, as required by tools such as Top Braid Composer.
usage: onto_tool export [-h] [-f {xml,turtle,nt} | -c CONTEXT] [--debug]
[-o OUTPUT] [-s] [-m IRI VERSION] [-b [{all,strict}]]
[--retain-definedBy] [--versioned-definedBy]
[ontology [ontology ...]]
positional arguments:
ontology Ontology file or directory containing OWL files
optional arguments:
-h, --help show this help message and exit
-f {xml,turtle,nt}, --format {xml,turtle,nt}
Output format
-c CONTEXT, --context CONTEXT
Export as N-Quads in CONTEXT.
--debug Emit verbose debug output
-o OUTPUT, --output OUTPUT
Path to output file.
-s, --strip-versions Remove versions from imports.
-m IRI VERSION, --merge IRI VERSION
Merge all inputs into a single ontology with the given
IRI and version
-b [{all,strict}], --defined-by [{all,strict}]
Add rdfs:isDefinedBy to every resource defined. If the
(default) "strict" argument is provided, only
owl:Class, owl:ObjectProperty, owl:DatatypeProperty,
owl:AnnotationProperty and owl:Thing entities will be
annotated. If "all" is provided, every entity that has
any properties other than rdf:type will be annotated.
--retain-definedBy When merging ontologies, retain existing values of
rdfs:isDefinedBy
The graphic
sub-command will create either
- a comprehensive diagram showing ontology modules together with classes, object properties and individuals together with the path of imports, or (if the 'wee' option is selected) a simple diagram of the ontology import hierarchy, or
- a diagram of the use of classes and object and data properties in a triple store or local ontology files.
Graphics are exported both as png
files and also as a dot
file. This dot
file can be used with Graphviz or with web tools such as Dot Viewer
usage: onto_tool graphic [-h] [-e ENDPOINT] [--schema | --data]
[--single-ontology-graphs] [--debug] [-o OUTPUT]
[--show-shacl]
[--link-concentrator-threshold LINK_CONCENTRATOR_THRESHOLD]
[--instance-limit INSTANCE_LIMIT]
[--predicate-threshold PREDICATE_THRESHOLD]
[--include [INCLUDE [INCLUDE ...]] |
--include-pattern [INCLUDE_REGEX [INCLUDE_REGEX ...]]
| --exclude [EXCLUDE [EXCLUDE ...]] |
--exclude-pattern
[EXCLUDE_REGEX [EXCLUDE_REGEX ...]]] [-v VERSION]
[-w [WEE [WEE ...]]]
[--label-language LABEL_LANGUAGE]
[--hide [HIDE [HIDE ...]]] [--no-image] [-t TITLE]
[ontology [ontology ...]]
positional arguments:
ontology Ontology file, directory or name pattern
optional arguments:
-h, --help show this help message and exit
-e ENDPOINT, --endpoint ENDPOINT
URI of SPARQL endpoint to use to gather data
--schema Generate ontology import graph (default)
--data Analyze instances for types and links
--single-ontology-graphs
If specified in combination with --endpoint when
generating a schema graph, assume that every ontology
is in its own named graph in the triple store.
Otherwise rdfs:isDefinedBy will be used to locate
entities defined by each ontology.
--debug Emit verbose debug output
-o OUTPUT, --output OUTPUT
Output directory for generated graphics
--show-shacl Attempts to discover which classes and properties have
corresponding SHACL shapes and colors them green on
the graph. This detection relies on the presence of
sh:targetClass targeting, and can be confused by
complex logical shapes or Advanced SHACL features such
as SPARQL queries.
--link-concentrator-threshold LINK_CONCENTRATOR_THRESHOLD
When the number links originating from the same class
that share a single predicate exceed this threshold
(default 10), use more compact display. Setting the
value to 0 disables this behavior.
-v VERSION, --version VERSION
Version to place in graphic
-w [WEE [WEE ...]], --wee [WEE [WEE ...]]
For ontologies matching the patterns specified, only
render the name and import information. If no patterns
are specified, applies to all ontologies.
--label-language LABEL_LANGUAGE
In case entities have labels in multiple languages,
select either the specified language (default: en) or
a non-lanugage label.
--hide [HIDE [HIDE ...]]
When visualizing data, hide classes and properties
matching the regexpatterns specified with this option.
--no-image Do not generate PNG image, only .dot output.
-t TITLE, --title TITLE
Title to use for graph. If not supplied, the repo URI
will be used if graphing an endpoint, or 'Gist' if
graphing local files.
--show-bnode-subjects Use triples with blank nodes in the subject to generate
the graphic.
Sampling Limits:
--instance-limit INSTANCE_LIMIT
Specify a limit on how many triples to consider that
use any one predicate to find (default 500000). This
option may result in an incomplete version of the
diagram, missing certain links.
--predicate-threshold PREDICATE_THRESHOLD
Ignore predicates which occur fewer than
PREDICATE_THRESHOLD times (default 10)
Filters (only one can be used):
--include [INCLUDE [INCLUDE ...]]
If specified for --schema, only ontologies matching
the specified URIs will be shown in full detail. If
specified with --data, only triples in the named
graphs mentioned will be considered (this also
excludes any triples in the default graph).
--include-pattern [INCLUDE_REGEX [INCLUDE_REGEX ...]]
If specified for --schema, only ontologies matching
the specified URI pattern will be shown in full
detail. If specified with --data, only triples in the
named graphs matching the pattern will be considered
(this also excludes any triples in the default graph).
For large graphs this option is significantly slower
than using --include.
--exclude [EXCLUDE [EXCLUDE ...]]
If specified for --schema, ontologies matching the
specified URIs will be omitted from the graph. If
specified with --data, triples in the named graphs
mentioned will be excluded (this also excludes any
triples in the default graph).
--exclude-pattern [EXCLUDE_REGEX [EXCLUDE_REGEX ...]]
If specified for --schema, ontologies matching the
specified URI pattern will be omitted from the graph.
If specified with --data, triples in the named graphs
matching the pattern will be ignored (this also
excludes any triples in the default graph). For large
graphs this option is significantly slower than using
--exclude.
The bundle
sub-command supports creating an ontology deployment containing both RDF and non-RDF artifacts for delivery or web hosting.
$ onto_tool bundle -h
usage: onto_tool bundle [-h] [--debug] [-v VARIABLE VALUE] bundle
positional arguments:
bundle JSON or YAML bundle definition
optional arguments:
-h, --help show this help message and exit
--debug Emit verbose debug output
-v VARIABLE VALUE, --variable VARIABLE VALUE
Set value of VARIABLE to VALUE
The bundle definition is either YAML or JSON, and contains the following sections:
variables:
name: "gist"
version: "X.x.x"
input: "."
rdf-toolkit: "{input}/tools/rdf-toolkit.jar"
output: "{name}{version}_webDownload"
Variables are initialized with the default values provided, but can be overriden via the --variable
command line option.
Values can reference other values using the {name}
template syntax.
All tools require a name
by which they are referenced in transform
actions. Three different tool types are supported:
- Java tools (
type: "Java"
) require a path to the executable Jar file specified via thejar
option, and a list ofarguments
that will be applied to each file processed. TheinputFile
andoutputFile
variables will be bound during execution, but other variables can be used to construct the arguments. tools:- name: "serializer" type: "Java" jar: "{rdf-toolkit}" arguments: - "-tfmt" - "rdf-xml" - "-sdt" - "explicit" - "-dtd" - "-ibn" - "-s" - "{inputFile}" - "-t" - "{outputFile}"
- Shell tools (
type: "shell"
) execute a command specified via a list ofarguments
that will be applied to each file processed. TheinputFile
andoutputFile
variables will be bound during execution, but other variables can be used to construct the arguments. tools:tools: - name: "java_version" type: "shell" arguments: - "java" - "-version"
- SPARQL tools apply a SPARQL Update query to each input file and serialize the resulting graph into the
output file. RDF format is preserved unless overridden with the
format
option. If the query is specified inline, template substitution will be applied to it, so bundle variables can be used, but double braces ({{
instead of{
,}}
instead of}
) have to be used to escape actual braces.- name: "add-language-en" type: "sparql" query: > prefix skos: <http://www.w3.org/2004/02/skos/core#> DELETE {{ ?subject skos:prefLabel ?nolang . }} INSERT {{ ?subject skos:prefLabel ?withlang }} where {{ ?subject skos:prefLabel ?nolang . FILTER(lang(?nolang) = '') BIND(STRLANG(?nolang, '{lang}') as ?withlang) }}
Actions are executed in the order they are listed. Each action must have an action
attribute,
and any action can contain a message
attribute, the contents of which will be
emitted as a INFO
-level log message prior to the execution of the action.
mkdir
, which requires adirectory
attribute to specify the path of the directory to be created (only if it doesn't already exist)copy
, which copies files into the bundle, and supports the following arguments:source
,target
,includes
andexcludes
- if neitherincludes
orexcludes
is present,source
andtarget
are both assumed to be file paths to a single file. If eitherincludes
orexcludes
is provided,source
andtarget
are assumed to be directories, and each member of theincludes
/excludes
lists is treated as a glob pattern inside thesource
directory. Ifincludes
is not present, it's presumed to be*
, andexcludes
is applied afterincludes
.rename
- If provided, must containfrom
andto
attributes. When specified, each file is renamed as it is copied, wherefrom
is treated as a Python regular expression applied to the base name of the source file, andto
is the substitution string which replaces it in the name of the target file. Backreferences are available for capturing groups, e.g.will add a version number to the base name of eachrename: from: "(.*)\\.owl" to: "\\g<1>{version}.owl"
.owl
file. Further documentation on Python regular expression replace functionality can be found here.replace
- If provided, must containfrom
andto
attributes. When specified, each file is processed after being copied, and each instance of thefrom
pattern is replaced withto
string in the file contents. Python regular expression syntax and backreferences are supported as shown in therename
documentation.
move
, which moves files according the provided options, which are identical to the ones supported bycopy
.
definedBy
, which inspects each input file to identify a single defined ontology, and then adds ardfs:isDefinedBy
property referencing the identified ontology to resources defined in the file.mode
- has two possible values,strict
(the default) andall
. Ifmode
isstrict
, any resources with typeowl:Class
,owl:ObjectProperty
,owl:DatatypeProperty
andowl:AnnotationProperty
is annotated. Otherwise,mode
isall
and any resource with a type and at least one other property is annotated. Existingrdfs:isDefinedBy
values are removed prior to the addition. Input and output file specification options are identical to those used by thecopy
action.versionedDefinedBy
- useowl:versionIRI
forrdfs:isDefinedBy
, when available.retainDefinedBy
- by default,definedBy
will override any existingrdfs:definedBy
annotations, but if this option is provided, existing annotations will be left in place.
export
, which functions similarly to the command-line export functionality, gathering one or more input ontologies and exporting them as a single file, with some optional transformations, depending on the following specified options:source
,target
,includes
andexcludes
- treated identically to thecopy
operation described above, excepttarget
is always treated as a single file path.merge
- if provided, it must have two mandatory fields,iri
andversion
. In this case, all ontologies declared in the input files are removed, and a single new ontologies, specified by theiri
is created, usingversion
to buildowl:versionInfo
andowl:versionIRI
. Any imports on the removed ontologies which are not satisfied internally are transferred to the new ontology.definedBy
- has two possible values,strict
andall
. If provided, ardfs:isDefinedBy
is added to all non-blank node subjects in the exported RDF linking them to the ontology defined in the combined graph. If more that one ontology is defined, the export will fail. Ifstrict
is specified, only classes and properties will be annotated, whereasall
does not filter by type.retainDefinedBy
- by default,definedBy
will override any existingrdfs:definedBy
annotations, but if this option is provided, existing annotations will be left in place.format
- One ofturtle
,xml
, ornt
(N-Triples), specifies the output format for the export. The default output format isturtle
.context
- If provided, generates a N-Quads export with thecontext
argument as the name of the graph. When this option is present, the value offormat
is ignored.compress
- when this istrue
, the output isgzip
-ed.
transform
, which applies the specified tool to a set of input files, and supports the following arguments:tool
, which references thename
of a tool which must be defined in thetools
section.source
,target
,includes
andexcludes
, which function just like they do for thecopy
andmove
actions, with each input and output path bound into theinputFile
andoutputFile
variables before the tool arguments are interpreted.replace
andrename
, which are applied after the tool invocation, and work as described above.
sparql
reads RDF files provided via thesource
andincludes
/excludes
options and executes a SPARQL query on the resulting combined graph.- If the
query
option is a valid file path, the query is read from that file, otherwise the contents of thequery
option are interpreted as the query. SELECT
query results are stored in the file specified viatarget
as a CSV.- RDF results from a
CONSTRUCT
query are stored as either Turtle, RDF/XML or N-Triples, depending on theformat
option (turtle
,xml
, ornt
). Update queries will alter the input data in place, and the resulting graph will be output in the specified format. UPDATE
queries executed on local files will modify the in-memory graph and then serialize the resulting graph to thetarget
.- The default functionality is to combine all RDF sources specified via
includes
and execute queries on the resulting graph. However, ifeachFile: true
is added, all queries will be applied to each source file separately, and will produce a separate output file. In this case,target
will be treated as a directory, and therename
option should be used when needed to construct the output file names. For example, the following action extracts the labels out of each RDF file into a separate CSV with matching names:- action: 'sparql' message: "Multi-file processing with SELECT" eachFile: true source: '{input}' includes: - '*_ontology.ttl' target: "{output}/each/select" rename: from: "(.*)\\.ttl" to: "\\g<1>.csv" query: > prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix skos: <http://www.w3.org/2004/02/skos/core#> select ?label WHERE {{ ?s rdfs:label ?label . }} order by ?label
- As an alternative to operating on local RDF specified via 'source', a query can
be executed on a triple store by specifying an
endpoint
, which must contain aquery_uri
, and can optionally specifyuser
/password
which will authenticate via HTTP basic authentication. Update queries will modify the triple store directly, and a separateupdate_uri
can be specified for databases which require it.
- If the
markdown
transforms a.md
file referenced insource
into an HTML output specified intarget
.graph
reads RDF files provided via thesource
andincludes
/excludes
options and generates a graphical representation of the ontology, as in thegraphic
sub-command described above. Both.dot
and.png
outputs are written to the directory specified in thetarget
option, andtitle
andversion
attributes configure the title on the generated graph. Ifcompact
is specified asTrue
, a concise graph including only ontology names and imports is generated.
The verify
action reads RDF files provided via the source
and includes
/excludes
options and performs validation on the
resulting combined graph. If the validation fails, the bundle process exits with a non-zero status and
does not execute subsequent actions. The type of verification performed depends on the
value of the type
option:
- If
type
isselect
, one or more SPARQLSELECT
queries are executed against the graph, and the first query to return a non-empty result will terminate the bundle. The results of the query will be output to the log, and also written as CSV to a file path specified by thetarget
option, if provided. Queries can be specified in one of two ways (only one can be present):- If the
query
option is a valid file path, the query is read from that file, otherwise the contents of thequery
option are interpreted as the query, e.g.query: > prefix skos: <http://www.w3.org/2004/02/skos/core#> select ?unlabeled where {{ ?unlabeled a ?type . filter not exists {{ ?unlabeled skos:prefLabel ?label }} }}
- If
queries
is provided, a list of queries will be built from thesource
andincludes
/excludes
sub-options. The queries will be executed in order specified. IfstopOnFail
is omitted or istrue
, the first query that produces a failing result will causeverify
to abort. IfstopOnFail
isfalse
, all queries will be executed regardless of failures, and the value oftarget
is treated as a directory where the results of each failing query will be written.- action: 'verify' type: 'select' source: '{input}' includes: - 'verify_data.ttl' target: '{output}/verify_select_results' stopOnFail: false queries: source: '{input}' includes: - 'verify_*_select_query.rq'
- If the
- If
type
isask
, one or more SPARQLASK
queries will be executed. Queries are specified similarly to theselect
validation. UnlessstopOnFail
is set tofalse
, the first query producing a result that does not match the requiredexpected
option, the bundle will terminate. For example:actions: - action: 'verify' type: 'ask' source: '{input}' includes: - 'verify_data.ttl' queries: source: '{input}' includes: - '*_ask_query.rq' expected: false
- If
type
isshacl
, a SHACL shape graph will be constructed from the file specified via theshapes
option (which must have asource
, and optionallyincludes
/excludes
), with the bundle terminating only if anysh:Violation
results are present, unless thefailOn
option specifies otherwise.The report is emitted to the log, and saved as Turtle to the path specified in the
target` option if it's provided. For example:If the- action: 'verify' type: 'shacl' inference: 'rdfs' source: '{input}' includes: - 'verify_data.ttl' target: '{output}/verify_shacl_errors.ttl' failOn: "warning" shapes: source: '{input}/verify_shacl_shapes.ttl'
inference
option is provided, the reasoner will be run on the graph prior to applying the SHACL rules. The valid values are:rdfs
,owlrl
,both
, ornone
(default).
- If
type
isconstruct
, the queries are expected toCONSTRUCT
a SHACL ValidationReport. The validation will be considered as a failure if the resulting graph is non-empty.target
,stopOnFail
andquery
/queries
are handled same asselect
validation, andfailOn
is used to determine which violations will terminate execution. - Validation can be performed against a SPARQL endpoint instead of local RDF
data by specifying
endpoint
instead ofsource
/includes
.endpoint
must contain aquery_uri
, and can optionally specifyuser
/password
which will authenticate via HTTP basic authentication. For example:- action: 'verify' type: 'construct' endpoint: query_uri: 'https://my.endpoint.com/sparql' user: 'test-user' password: 'test-user' target: '{output}/verify_construct_results' stopOnFail: false query: '{input}/verify_via_construct.rq'