Skip to content

Latest commit

 

History

History
329 lines (253 loc) · 9.58 KB

Readme.md

File metadata and controls

329 lines (253 loc) · 9.58 KB

Screenshot

RocketRML

For the legacy version with the different behavior of the iterator please see this version.

This is a javascript RML-mapper implementation for the RDF mapping language (RML).

Try it out

If you want to try the mapper without installing you can also see a working demo with a graphical interface here

Install

To install it via npm:

npm install rocketrml

Quick-start

After installation, you can to copy index.js into your current working directory. Also, the mapfile.ttl and the input is needed.

Then you can execute your first example via:

node index.js

which starts the execution and the output is then written to ./out.n3.

Also, an example Dockerfile can be seen here.

What does it support

RML Vocabulary

the following list contains the current supported classes.

rr:TriplesMap is the class of triples maps as defined by R2RML.
rml:LogicalSource is the class of logical sources.
rr:SubjectMap is the class of subject maps.
rr:PredicateMap is the class of predicate maps.
rr:ObjectMap is the class of object maps.
rr:PredicateObjectMap is the class of predicate-object maps.
rr:RefObjectMap is the class of referencing object maps.
rml:referenceFormulation is the class of supported reference formulations.
rr:Join is the class of join conditions.

Missing:

rr:R2RMLView is the class of R2RML views.
rr:BaseTableOrView is the class of SQL base tables or views.
rr:GraphMap is the class of graph maps.

Querying languages

The mapper supports XML, JSON and CSV as input format. For querying the data, JSONPath (json), XPath (xml) and csv-parse (csv) are used. Since JSON is supported natively by javascript, it has a huge speed benefit compared to XML.

Therefore, the mapper also contains a C++ version (which uses pugixml) of the XML-parser which is disabled by default, but can be enabled via the options parameter xpathLib: 'pugixml'.

XPath 3.1 is available through fontoxpath and must be enabled through the option: xpathLib: 'fontoxpath'

How does it work

The parseFile function in index.js is the entry point. It takes an input path (the mapping.ttl file) and an output path (where the json output is written). The function returns a promise, which resolves in the resulting output, but the output is also written to the file system.

The options parameter

{
    /*
    compact jsonld document with provided context
    { http://schema.org/name:"Tom" } 
    -> 
    { 
      @context:"http://schema.org/",
      name:"Tom"
    }
    */
    compress: { 
      '@vocab': "http://schema.org/"
    },
    /*
    If you want n-quads instead of json as output, 
    you need to define toRDF to true in the options parameter
    */
    toRDF: true,
    /*
    If you want to insert your all objects with their regarding @id's (to get a nesting in jsonld), "Un-flatten" jsonld
    */
    replace: true,
    /*
    You can delete namespaces to make the xpath simpler.
    */
    removeNameSpace: {xmlns:"https://xmlnamespace.xml"},
    /*
    Choose xpath evaluator library, available options: default | xpath (same as default) | pugixml (cpp xpath implementation, previously xmlPerformanceMode:true) | fontoxpath (xpath 3.1 engine)
    */
    xpathLib: "default",
    /*
    ignore input values that are empty string (or whitespace only) (only use a value from the input if value.trim() !== '') (default false)
    */
    ignoreEmptyStrings: true,
    /*
    values that are to be ignored from the input. E.g ignore all input values that are "-"
    */
    ignoreValues: ["-"],
    /*
    You can also use functions to manipulate the data while parsing. (E.g. Change a date to a ISO format, ..)
    */
    functions : {**See the Functions section**}
    /*
    Any options to parse the csv. available: delimiter - default ","
    */
    csv: {
      // any options from https://csv.js.org/parse/options/
      // defaults already set:
      columns: true,
      skip_empty_lines: true,
    }
    
}

How to call the function

const parser = require('rocketrml');

const doMapping = async () => {
  const options = {
    toRDF: true,
    verbose: true,
    xmlPerformanceMode: false,
    replace: false,
  };
  const result = await parser.parseFile('./mapping.ttl', './out.n3', options).catch((err) => { console.log(err); });
  console.log(result);
};


doMapping();

If you do not want to use the file system, you can use

  const result = await parser.parseFileLive(mapfile, inputFiles, options);

Where mapfile is the mapping.ttl as string, inputFiles is an object where the keys are the file names and the value is the file content as string. E.g:

let inputFiles={
  mydata: '{name:test}',
  mydata2: '<root>testdata</root>'
};

Example

Below there is shown a very simple example with no nesting and no array.

More can be seen in the tests folder

Input

{
  "name":"Tom A.",
  "age":15
}

Turtle mapfile

The mapfile must also specify the input source path.

  @prefix rr: <http://www.w3.org/ns/r2rml#> .
  @prefix rml: <http://semweb.mmlab.be/ns/rml#> .
  @prefix schema: <http://schema.org/> .
  @prefix ql: <http://semweb.mmlab.be/ns/ql#> .
  @base <http://sti2.at/> . #the base for the classes
  
  
  <#LOGICALSOURCE>
  rml:source "./tests/straightMapping/input.json";
  rml:referenceFormulation ql:JSONPath;
  rml:iterator "$".
  
  
  <#Mapping>
  rml:logicalSource <#LOGICALSOURCE>;
  
   rr:subjectMap [
      rr:termType rr:BlankNode;
      rr:class schema:Person;
   ];
  
  
  rr:predicateObjectMap [
      rr:predicate schema:name;
      rr:objectMap [ rml:reference "name" ];
  ];
  
  rr:predicateObjectMap [
      rr:predicate schema:age;
      rr:objectMap [ rml:reference "age" ];
  ].

Output

[{
  "@type": "http://schema.org/Person",
  "http://schema.org/name": "Tom A.",
  "http://schema.org/age": 15
}]

Functions:

To fit our needs, we also had to implement the functionality to programmatically evaluate data during the predicateObjectMap.
Therefore, we also allow the user to write use javascript functions, they defined beforehand and passes through the options parameter. An example how this works can be seen below:

Input

{
  "name":"Tom A.",
  "age":15
}

Turtle mapfile

The mapfile must also specify the input source path.

  @prefix rr: <http://www.w3.org/ns/r2rml#> .
  @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
  @prefix rml: <http://semweb.mmlab.be/ns/rml#> .
  @prefix schema: <http://schema.org/> .
  @prefix ql: <http://semweb.mmlab.be/ns/ql#> .
  @prefix fnml: <http://semweb.mmlab.be/ns/fnml#> .
  @prefix fno: <http://w3id.org/function/ontology#> .
  @prefix grel: <http://users.ugent.be/~bjdmeest/function/grel.ttl#> .
  @base <http://sti2.at/> . #the base for the classes
  
  
  <#LOGICALSOURCE>
  rml:source "./tests/straightMapping/input.json";
  rml:referenceFormulation ql:JSONPath;
  rml:iterator "$".
  
  
  <#Mapping>
  rml:logicalSource <#LOGICALSOURCE>;
  
   rr:subjectMap [
      rr:termType rr:BlankNode;
      rr:class schema:Person;
   ];
  
  
  rr:predicateObjectMap [
      rr:predicate schema:name;
      rr:objectMap [ rml:reference "name" ];
  ];
  
  rr:predicateObjectMap [
      rr:predicate schema:age;
      rr:objectMap [ rml:reference "age" ];
  ];
  
   rr:predicateObjectMap [
        rr:predicate schema:description;
        rr:objectMap  <#FunctionMap>;
    ].
    
    <#FunctionMap>
         fnml:functionValue [
             rml:logicalSource <#LOGICALSOURCE> ;
             rr:predicateObjectMap [
                 rr:predicate fno:executes ;
                 rr:objectMap [ rr:constant grel:createDescription ]
             ] ;
             rr:predicateObjectMap [
                 rr:predicate grel:inputString ;
                 rr:objectMap [ rml:reference "name" ]
             ];
              rr:predicateObjectMap [
                  rr:predicate grel:inputString ;
                  rr:objectMap [ rml:reference "age" ]
              ];
         ] .

where the option parameter looks like this:

  let options={
        functions: {
            'http://users.ugent.be/~bjdmeest/function/grel.ttl#createDescription': function (data) {
                let result=data[0]+' is '+data[1]+ ' years old.'; 
                return result;
                }
            }
        };

Output

[{
  "@type": "http://schema.org/Person",
  "http://schema.org/name": "Tom A.",
  "http://schema.org/age": 15,
  "http://schema.org/description": "Tom A. is 15 years old."
}]

Description

The <#FunctionMap> has an array of predicateObjectMaps. One of them defines the function with fno:executes and the name of the function in rr:constant. The others are for the function parameters. The first parameter (rml:reference:"name") is then stored in data[0], the second in data[1] and so on.