Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Biolink Model validation depth to Edge attribute (constraint) validation #19

Open
RichardBruskiewich opened this issue Jul 19, 2022 · 11 comments
Assignees

Comments

@RichardBruskiewich
Copy link
Collaborator

RichardBruskiewich commented Jul 19, 2022

The current TRAPI Knowledge Graph edge validation falls short of detailed validation of edge attributes (see attribute validation code snippet (or lack thereof) in the edge validator.

Deeper validation of Biolink Model (and perhaps, some TRAPI-specific) validation is required at this level.

@RichardBruskiewich
Copy link
Collaborator Author

Commentary from @edeutsch specifically regarding validation of the attribute_type_id (preferably inheriting from biolink:association_slot)

The constraint is not really absolute, mostly because many of the concepts we want to convey are not in biolink. So I think we should enforce a curie, so if there's not colon, then that should be an error. If the prefix is not biolink, that should be a warning. If the prefix is not known to biolink that should be a double warning. Are we able to report warnings? or just errors?

  • ERROR if not a CURIE
  • WARNING if not an instance of biolink:association_slot (but is another Biolink class?)
  • (double) WARNING if prefix is not known to Biolink (i.e. not known to Biolink in what context? All registered prefixes?)
    _

@sierra-moxon
Copy link
Member

toolkit.get_element_by_prefix() might help
if that returned an empty set, you'd know that biolink doesn't register the prefix on any of its classes.

If the double warning is instead to validate that the curie prefix is a valid prefix in the world, I think you can use other packages for that (see bioregistry).

For infores validation, I'd like to add BMT methods that would query that controlled vocabulary. We could repurpose this ticket for that purpose, if you think toolkit.get_element_by_prefix() will work for this.

@RichardBruskiewich
Copy link
Collaborator Author

Yes, toolkit.get_element_by_prefix() will suffice. I've added it with unit tests.

@RichardBruskiewich
Copy link
Collaborator Author

Substantially resolved by PR #31.

The one missing aspect of this issue to be implemented is the ERROR / WARNING / (double) WARNING classification above. This intent is covered by PR #27.

@RichardBruskiewich
Copy link
Collaborator Author

@sierra-moxon , @edeutsch , rather than opening up a fresh issue, I note our discussion today (23 March 2023) as discovering that toolkit.get_element_by_prefix() is limited to searches on category class id_prefix values and rather that the use case requires that we cast a wider net, perhaps including association slots. In the meantime, it almost feels like we need to revisit the Biolink model for biolink:Attribute (which has some relevant but not all id_prefix mappings), and its relationship to biolink:has_attribute_type, etc. as well as internal 'attribute types' like biolink:knowledge_source++ provenance.

@RichardBruskiewich
Copy link
Collaborator Author

RichardBruskiewich commented Mar 24, 2023

    "warning.knowledge_graph.edge.attribute.type_id.unknown_prefix": {
      "CHEMBL.COMPOUND:CHEMBL112--biolink:occurs_together_in_literature_with->CHEMBL.COMPOUND:CHEMBL1201558": [
        {
          "attribute_type_id": "EDAM-DATA:2526"
        },
        {
          "attribute_type_id": "EDAM-OPERATION:0226"
        },

@RichardBruskiewich
Copy link
Collaborator Author

  attribute:
    is_a: named thing
    mixins:
      - ontology class
...
    slots:
      - name                   # 'attribute_name'
      - has attribute type     # 'attribute_type'
      # 'value', 'value_type', 'value_type_name'
      # extracted from either of the next two slots
      - has quantitative value
      - has qualitative value
...
    id_prefixes:
      - EDAM-DATA
      - EDAM-FORMAT
      - EDAM-OPERATION
      - EDAM-TOPIC

@RichardBruskiewich
Copy link
Collaborator Author

biolink:has_attribute_type is defined:

  has attribute type:
...
    domain: attribute
    range: ontology class
    multivalued: false
    required: true
...

@RichardBruskiewich
Copy link
Collaborator Author

If anything, all the TRAPI attribute_type_id mappings ought to be here, but aside from having no parent (what should it be? An (association or node) slot?, its range is biolink:OntologyClass which may feel sensible except for the fact that some of the acceptable attribute_type_field values are not really ontology terms (I mean, the biolink:association_slot child biolink:primary_knowledge_source is not a biolink:OntologyClass?) Maybe I'm being pedantic but then again, maybe we need to clean up the model here!

RichardBruskiewich added a commit that referenced this issue Mar 24, 2023
@sierra-moxon
Copy link
Member

sierra-moxon commented Mar 24, 2023

this test in bmt works:

    # get class by prefix
    elements = toolkit.get_element_by_prefix("EDAM-DATA:123345")
    assert "attribute" in elements

as does this:

    # get slot by prefix
    elements = toolkit.get_element_by_prefix("BSPO:123345")
    assert "original predicate" in elements

so I think the method is working on all elements at least

@sierra-moxon
Copy link
Member

Also, technically, biolink is in BioPortal as an ontology :) - but I agree the attribute in biolink does not map to a TRAPI attribute.

@RichardBruskiewich RichardBruskiewich changed the title Add Biolink Model validation depth to Edge attribute validation Add Biolink Model validation depth to Edge attribute (constraint) validation May 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants