The generated labelled property graph (LPG) is a view of the Project Data Graph. Its purpose is to provide a means to perform graph traversal queries on instance-level data. The LPG view strictly covers the datatype properties of, and relationships between, model elements. It does not cover the UML and MMS ontologies, so queries that depend on the type hierarchy from the UML metmodel, such as those involving subclass relations, are not supported. In other words, while RDF allows for ABox and TBox statements to exist within the same model, LPGs are mostly suited for ABox statements.
Node.js version 10.*, 11.*, 12.* .
make (for building native add-ons for node.js; node-gyp)
The following environment variables also need to be set:
- MMS_SPARQL_ENDPOINT - URL of the SPARQL endpoint (local or remote) which will be updated and queried during triplification, e.g.,
http://localhost:13030
. - MMS_PROJECT_NAME - name of the project (determines name of generated data graph)
- MMS_MAPPING_FILE - path to input JSON mapping file(s) (either absolute path or relative to this project root dir)
The following environment variables need to be set:
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
NEPTUNE_ENDPOINT- deprecated and removed. useMMS_SPARQL_ENDPOINT
instead.- NEPTUNE_S3_BUCKET - S3 Bucket URI (see example)
- NEPTUNE_REGION - S3 Bucket region string
- NEPTUNE_PROXY - (optional) define a proxy to tunnel requests to the endpoint thru (see example)
From the project root dir:
- Install the npm package:
$ npm i
- Create an environment variables file. Here is an example file for local usage:
.local.env
:
#!/bin/bash
export MMS_SPARQL_ENDPOINT=http://localhost:13030/ds
export MMS_PROJECT_NAME=tmt
export MMS_MAPPING_FILE=input/tmt/mapping/*.json
export MMS_MAX_REQUESTS=32
export MMS_MAX_SOCKETS=32
- Load the environment variables into you shell session,
$ source .local.env
. - Ensure that you have a mapping file located at the
MMS_MAPPING_FILE
path (a default mapping for the TMT project is provided with this repository). - Ensure that you have a project data file ready to use. If you cloned this repository, you can use the TMT project provided in
./input/tmt/data
but should extract the JSON file from that .tar.gz archive in this step:
$ cd ./input/tmt/data
$ tar -xzvf tmt_data.json.tar.gz
- If you are connecting to AWS Neptune, make sure to open a tunnel to an EC2 instance that is within the same VPC as the Neptune cluster:
$ ssh -i aws.pem -D 3031 ubuntu@EC2_IP
- Follow the steps in Setup to initialize the project.
- Prepare a local triplestore. A helper bash script is provided at
./util/local-endpoint.sh
which you can run from the command-line. The script will read the port from theMMS_SPARQL_ENDPOINT
URI string and attempt to launch a named docker container of the Apache Jena Fuseki triplestore with an in-memory database that binds to the host at0.0.0.0
on the port in the endpoint string (e.g.,:13030
). Be aware that re-running this script will overwrite older containers, i.e., the triplestore may have to be reloaded. - Build the vocabulary graph:
$ npx emk local.update.vocabulary.*
. It is OK if you see warnings about UML properties having multiple keys, this means the system is handling URI minting conflicts. You can now inspect the source of the vocabulary Turtle files under./build/vocabulary/
. - Build the instance data graph:
$ ./util/build-local.sh input/tmt/data/tmt_data.json
. This is a multithreaded build tool that may take a while depending on the size of the input dataset. A progress bar will be printed to console. It is OK if you see warnings about unmapped object keys that begin with an underscore (unmapped metadata properties). - All done! The script from the previous step will stitch together all the output files into a single master output Turtle file located at
./build/
.
All together, the commands in order might look like this:
$ source .local.env
$ ./util/local-endpoint.sh
$ npx emk local.update.vocabulary.*
$ ./util/build-local.sh input/tmt/data/tmt_data.json
Build tasks are handled by emk.js. The following tasks are available:
In the following sections, TYPE
is a placeholder for either vocabulary
or data
.
Building graphs (i.e., triplification):
local.TYPE
- build the givenTYPE
graph locally (e.g.,local.vocabulary
orlocal.data
), which outputs graphs as both RDF (in Turtle format) and LPG (in CSV format).
Modifying contents of the remote triplestore:
remote.clear.TYPE
- clear the givenTYPE
graph from the remote triplestoreremote.upload.TYPE
- upload the localTYPE
graph file to the S3 bucketremote.update.TYPE
- load theTYPE
graph from the S3 bucket into the triplestore
Example:
# clear all graphs from remote triplestore
$ npx emk remote.clear.*
# upload vocabulary graph to S3 and then update Neptune
$ npx emk remote.upload.vocabulary remote.update.vocabulary
The build targets automatically depend on the necessary tasks, so you can simply run:
$ npx emk remote.update.*
which will build the vocabulary and data graphs, upload them to S3, and then update the triplestore.
#!/bin/bash
export MMS_SPARQL_ENDPOINT=http://open-cae-mms.c0fermrnxxyy.us-east-2.neptune.amazonaws.com:8182
export NEPTUNE_S3_BUCKET_URL=s3://open-cae-mms-rdf
export NEPTUNE_S3_IAM_ROLE_ARN=arn:aws:iam::230084004409:role/NeptuneLoadFromS3
export NEPTUNE_REGION=us-east-2
export NEPTUNE_PROXY=socks://127.0.0.1:3031
export MMS_PROJECT_NAME=tmt
export MMS_MAPPING_FILE=input/tmt/mapping/*.json
export AWS_ACCESS_KEY_ID=YOUR_AWS_ACCESS_KEY_ID
export AWS_SECRET_ACCESS_KEY=YOUR_AWS_SECRET_ACCESS_KEY