Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split function not supported #34

Open
vemonet opened this issue Oct 19, 2021 · 0 comments
Open

Split function not supported #34

vemonet opened this issue Oct 19, 2021 · 0 comments

Comments

@vemonet
Copy link
Contributor

vemonet commented Oct 19, 2021

Hi, we tried the RMLStreamer with mappings using the grel:string_split function on a CSV file, and it did not work (follow up from issue #16 )

We used the latest release 2.1.1 of the RMLStreamer.jar with the Flink image supported by this release (we reused the same image found in your docker-compose.yml file at the tag 2.1.1)

The YARRRML file we use:

prefixes:
  rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  rdfs: "http://www.w3.org/2000/01/rdf-schema#"
  xsd: "http://www.w3.org/2001/XMLSchema#"
  grel: "http://users.ugent.be/~bjdmeest/function/grel.ttl#"
  idlab: "http://example.com/idlab/function/"
  idsf: "https://w3id.org/um/ids/rmlfunctions.ttl#"
  pubmed: "https://identifiers.org/pubmed:"
  drugbank: "https://identifiers.org/drugbank:"
  mesh: "https://identifiers.org/mesh:"
  uniprot: "https://identifiers.org/uniprot:"
  omim: "https://identifiers.org/mim:"
  schema: "https://schema.org/"
  sio: "http://semanticscience.org/resource/"
  bio2kg: "https://w3id.org/bio2kg/data/"
  ncbigene: "https://identifiers.org/ncbigene:"
  ncbitaxon: "http://purl.org/obo/owl/NCBITaxon#"

mappings:
  proteins:
    sources:
      - ['iproclass.csv~csv']
    s: uniprot:$(UniProtKB accession)
    po:
      - [a, sio:Protein]
      - [sio:hasProvider, bio2kg:graph/iproclass~iri]
      - [sio:affects, ncbitaxon:$(NCBI taxonomy)~iri]  # 9606
      - p: sio:isSupportedBy
        o:
            function: grel:string_split
            parameters:
                - [grel:p_string_sep, ";"]
                - [grel:valueParameter, $(PubMed)]

Here a sample of the CSV file:

UniProtKB accession,UniProtKB ID,EntrezGene,RefSeq,NCBI GI number,PDB,Pfam,GO,PIRSF,IPI,UniRef100,UniRef90,UniRef50,UniParc,PIR-PSD accession,NCBI taxonomy,MIM,UniGene,Ensembl,PubMed ID,EMBL GenBank DDBJ,EMBL protein_id
"Q6GZX4","001R_FRG3G","2947773","YP_031579.1","81941549; 49237298","","PF04947","GO:0046782","","","UniRef100_Q6GZX4","UniRef90_Q6GZX4","UniRef50_Q6GZX4","UPI00003B0FD4","","654924; 654925","","","","15165820","AY548484","AAT09660.1"
"Q6GZX3","002L_FRG3G","2947774","YP_031580.1","49237299; 81941548","","PF03003","GO:0033644; GO:0016021","","","UniRef100_Q6GZX3","UniRef90_Q6GZX3","UniRef50_Q6GZX3","UPI00003B0FD5","","654924; 654925; 654926","","","","15165820","AY548484","AAT09661.1"

No error are raised, all the regular predicateobjects are generated apart from the one with the function.

The split function of those mappings works with the RMLmapper (you can try it directly here: https://rml.io/yarrrml/matey/#)

We run the RMLStreamer using the Flink CLI:

/opt/flink/bin/flink run -p 8 -c io.rml.framework.Main /mnt/RMLStreamer.jar toFile -m /mnt/mapping.rml.ttl -o /mnt/output.nt --job-name "RMLStreamer job"

Is there anything we need to do to make the Split function work?

Note we also tried to add custom functions following the documentation (either by adding the jar and ttl files to the right folders, or by recompiling the RMLStreamer.jar), but the RMLStreamer could not find the function we added (the same custom functions works with the RMLmapper).

It is not clear how functions are implemented, according to other issues such as #33 it seems like some of them are implemented, but maybe it was manual ad hoc implementations for some functions like lowercase? But there was no general framework to run any functions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant