-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check for IDSPACE conflicts on new ontology submission #1704
Comments
Thanks for writing this up, James. As we've noted on biopragmatics/bioregistry#273, the Bioregistry does not a likely will never consume the full BioPortal, so we should consider the other aspect of whether the OBO Foundry would want to fully respect the prefixes minted in the BioPortal or not (e.g., there can be and in some cases already are nonsensical overlaps/conflicts with high quality resources in the OBO Foundry, etc.) |
@cthoyt Do you have a list of conflicts between BioPortal and OBO by any chance? |
Cross post of biopragmatics/bioregistry#273 (comment):
|
@nataled for now we need to document clearly that a new request must not conflict with anything in bioregistry, BioPortal. I can
before we:
|
https://obofoundry.org/id-policy.html#allocating-idspaces currently indicates to check with identifiers.org. Please confirm that this should be replaced with bioregistry and bioportal. Alternatively, the latter two can be added. |
See previous discussion on #1519 |
Clear as mud! ;) So... I'm looking for a definitive list of resources (names and URLs), specifically for the lists themselves (not some upper-level landing page). In other words, the user should be able to go to the link we provide and see the list of prefixes. Failing that, some page that provides a search function. |
The Bioregistry imports Identifiers.org, OBO Foundry, and N2T as well as many other resources (see here for a full list), so it can be a one-stop shop for most resources. However, it does not import all of BioPortal, so users should check there too. Web Access
Data DumpsBioregistry also has several full dumps for potential contributors who want to access this information programmatically. These are all updated on a nightly basis. Bioportal doesn't offer any first-party data dumps, but the Bioregistry generates one nightly at https://github.com/biopragmatics/bioregistry/blob/main/src/bioregistry/data/external/bioportal/raw.json Programmatic AccessProgrammatic way to check if something is in the Bioregistry: import bioregistry
query = "EPSO"
available_in_bioregistry = bioregistry.normalize_prefix(query) is None Programmatic way to check if something is in BioPortal: from bioregistry.external.bioportal import get_bioportal
query = "EPSO"
bioportal_dict = get_bioportal()
available_in_bioportal = query not in bioportal_dict |
Perfect, thanks! |
@nataled I updated my comment above with more information that might be more actionable. Feel free to reuse part or all of it |
I have updated the documentation (which is outside the scope of this ticket). Please see this page: https://obofoundry.org/id-policy.html and look for the section Allocating IDSPACEs and the subsection Guidelines for selecting an IDSPACE |
@matentzn note that my changes should satisfy your first checkbox regarding updating the instructions (the template itself already points to the document I just revised; I see no need to add text to the template itself since that will just duplicate the information). |
This all looks great. I made an issue at the OBO dashboard to implement @cthoyt checker: Thank you both for dealing with this! Are there any remaining action items here? |
Looks like all aspects of this have either been taken care of, or have a ticket to do so. |
Great! Thank everyone for your input! |
The only concern I have is with the 'strength' of this requirement, and its scope. Strength referring to dashboard report error, warn, or info. I'm certain that a clash with another Foundry ontology would be an ERROR, for example, but not so sure about clashes with non-ontology resources that might be little-known projects. Scope refers to whether or not the ontology needs to be concerned with obsolete resources. I'm not sure these aspects have been discussed. |
I am happy to publicise this widely, but I think bioportal and bioregistry clashes at the very least MUST be avoided moving forward.. we owe this to open science. I am happy to leave this ticket open, but I would say, if we don't get any seriously strong argument for permitting namespace clashes with existing resources, used or otherwise, I think this will be an ERROR. What about this: If we don't see counter arguments on this issue until Friday 24th December, the bioportal/bioregistry clash rule goes into OBO Law. |
It's basically written that way now, at least by interpretation. I'm not objecting or wavering, really, but I don't recall any discussion of nuances like those I mentioned. Perhaps an Ops call agenda item? |
Ok. Remaining action item:
|
@matentzn I'd also propose this should require a technical check that fails on a PR that has problematic content, it's always possible people miss what's in the dashboard. |
This is not just for this case here - I think I have a better idea for that which does not require a check. Basically, in order to pass the dashboard the whole config must be present - since its already there, we should just be able to use it instead of having ontology submitters use their own. An even better idea: We require the pull request with the metadata right from the start, even before permission - then the dashboard can just pull that - this will totally automated the OBO nor dashboard with no need for me to intervene anymore. |
We should add this to the next OBO Ops call agenda, which will be chaired by @nicolevasilevsky. |
Given that we have put this in our ID Policy here https://obofoundry.org/id-policy.html, and our NTR issue here I don't think that it needs to be put in front of OBO Ops again. It has been decided. @nataled should check that this is documented accurately enough, I think it is good enough, but some stronger wording could help. So the only remaining item here is
|
In addition to Bioregistry and BioPortal, we will need to include a check of (probably) the NOR Dashboard. See biopragmatics/bioregistry#1212 for discussion and reason why. That should catch 99.9% of potential clashes. |
#1703 indicates a larger problem with namespace conflicts.
New projects should be encouraged (required?) to check for conflicts. Checking https://bioregistry.io/ would be the easiest and most effective place to look currently. The easiest thing would be to update our instructions. Better would be an automated check, maybe in https://github.com/OBOFoundry/obo-nor.github.io.
Our current documentation https://obofoundry.org/id-policy.html#allocating-idspaces points to http://identifiers.org/, but it does not include "EPSO" and would not have helped in this case.
The text was updated successfully, but these errors were encountered: