Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a include or exclude list for properties #27

Open
matentzn opened this issue Nov 1, 2022 · 8 comments
Open

Create a include or exclude list for properties #27

matentzn opened this issue Nov 1, 2022 · 8 comments

Comments

@matentzn
Copy link
Member

matentzn commented Nov 1, 2022

Phenio contains some relationships like http://purl.obolibrary.org/obo/emapa#is_a which is really confusing. These should be removed prior to release.

@julesjacobsen
Copy link

So, what is the correct URI for is_a? I couldn't find anything in OLS.

@matentzn
Copy link
Member Author

matentzn commented Nov 1, 2022

rdfs:subClassOf

@julesjacobsen
Copy link

julesjacobsen commented Nov 1, 2022

So, would rdfs:subClassOf be considered the CURIE for https://www.w3.org/TR/rdf-schema/#ch_subclassof once expanded and is_a being a synonym? Just wondering how this fits with the Node model in obographs.

@matentzn
Copy link
Member Author

matentzn commented Nov 1, 2022

I think rdfs:subClassOf is considered a "built-in" and would probably not be represented in obographs at all as a node. Its a good questions though, the distinction is sort of arbitrary. Why do you need to know a CURIE/IRI for isa in obographs? OAK has an obographs to OWL mappings which handles all this expansion..

@julesjacobsen
Copy link

julesjacobsen commented Nov 1, 2022

This is indeed how it is (once http://purl.obolibrary.org/obo/emapa#is_a is removed) - there is zero mention about the source of 'is_a', yet it is the most commonly referenced predicate. Practically all other predicates used in an Edge are URIs declared with their label as a Node (either as a CLASS or PROPERTY type), so if you're doing the naive thing of using URIs to look-up a Node it fails here because is_a is never declared anywhere!

e.g.

GraphDocument graphDocument = openGraphDocument("phenio.json");
Graph phenio = graphDocument.getGraphs().get(0);

// create a map of Id: Node, where Id is a URI String
Map<String, Node> nodes = phenio.getNodes().stream()
                .map(node -> Node.of(node.getId(), node.getLabel()))
                .collect(Collectors.toMap(Node::getId, Function.identity()));

// special case it seems
Node isA = nodes.values().stream()
        .filter(node -> node.getLabel().equals("is_a"))
        .findFirst()
        // so this would be where to put rdfs:subClassOf - perhaps that ought to be the URI and keep `is_a` as the label?
        .orElse(Node.of("http://purl.obolibrary.org/obo/emapa#is_a", "is_a"));

phenio.getEdges().stream()
                .forEach(edge -> {
                    Node subject = hpNodes.get(edge.getSub());
                    Node object = hpNodes.get(edge.getObj());
                    // annoyingly we can't treat the predicate in a consistent fashion due to
                    // 'is_a' being an undeclared, implicit 'primitive' type 
                    Node predicate;
                    if (edge.getPred().equals("is_a")) {
                        predicate = isA;
                    } else {
                        predicate = hpNodes.get(edge.getPred());
                    }
                 })

@matentzn
Copy link
Member Author

matentzn commented Nov 3, 2022

cc @cmungall

@cmungall
Copy link
Member

cmungall commented Nov 8, 2022

The focus of this issue is predicates like http://purl.obolibrary.org/obo/emapa#is_a which seems like a data bug. I don't think this has anything to do with obojson.

I checked the latest emapa.owl and it is present but only as a declaration

✗ grep is_a db/phenio.owl | grep emapa
    <!-- http://purl.obolibrary.org/obo/emapa#is_a -->
    <owl:ObjectProperty rdf:about="http://purl.obolibrary.org/obo/emapa#is_a">

unfortunately there is no way to tell from the OWL where this comes from but a good guess is emapa itself:

✗ curl -L -s $OBO/emapa.owl  | grep emapa#is_a
    <!-- http://purl.obolibrary.org/obo/emapa#is_a -->
    <owl:ObjectProperty rdf:about="http://purl.obolibrary.org/obo/emapa#is_a">

I know the genesis of these things, twenty years ago someone declared is-a in oboedit even though they didn't need to and it is sticking around ever since.

This one is harmless as it's just a declaration that is not used. Of course we should still report upstream and possibly fix, and we should do more QA/QC on ontologies we bring in.

But there are worse issues. emapa isn't using the standard part-of predicate (BFO:0000050). It is using http://purl.obolibrary.org/obo/emapa#part_of

This means that partonomy queries on EMAPA will yield massively incomplete results. And EMAPA is essentially a partonomy, there is minimal info in subclassing.

I suggest a strategy:

  • be selective in what we bring in
    • there should be a use case for it
    • what is the use case for emapa? and is it being satisfied if emapa is flat?
  • be aggressive in repairs
    • some of this can be pushed upstream
      - e.g. uberon necessarily has to repair relations in emapa to make them usable in composite-metazoan

We should have a monarch-wide simple profile that should be satisfied

  • every class should have a CURIE and a unique label
  • there should be an is-a path to a biolink class
  • for anatomy ontologies, part-of MUST be present using the current CURIE
  • for anatomy ontologies, develops-from SHOULD be present (MUST for ZFA, Uberon) using correct URI
  • synonyms should be present (for search) using oio

Pretty much everything else can be ignored

@cmungall
Copy link
Member

cmungall commented Nov 8, 2022

I don't think obojson is relevant to this issue at all, but regarding the question, yes is_a is the hardcoded value for rdfs:subClassOf between two named classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants