-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Research Organization Registry (ROR) IDs #6640
Comments
@mcuthill hi! Two weeks ago I heard all about ROR at this event in Lisbon the day before PIDapalooza 2020: https://www.eventbrite.com/e/the-ror-community-meeting-lisbon-registration-82814758171 It was a fun group! Here's a pic from https://twitter.com/ResearchOrgs/status/1222159655377473539 Here are my main takeaways from that event:
I don't think ROR IDs make sense in the list of author identifier schemes (ORCID, etc.) (or does it?!?) but yes, ROR IDs could be tied to Affiliation and other fields you mentioned. (Would it make sense to re-title this issue to something like "Support Research Organization Registry (ROR) IDs"?) Off the top of my head I'm not sure how much work this would take. |
@pdurbin Thanks for sharing all the materials from that workshop! ROR definitely seems to be gaining momentum. You're right that it wouldn't generally fit in the Author identifier category, except in edge cases like ours (Ocean Networks Canada) where the data mostly isn't directly associated with a single PI so the organization serves as the author. It might be good to have it as an option in the Identifier Scheme list for situations like that, but also added to other field/s where organizations are normally identified. |
@mcuthill sure. As has been discussed extensively in #5029 Dataverse doesn't currently have a way to express the different between a person and an organization in the "Author" fields and subfields, but I see what you mean. If there was a checkbox or something for "organization", perhaps we could prompt for a ROR ID. Something like this (you have to imagine the checkbox) |
DataCite maybe two years ago added the optional property nameType that can be either "Personal" or "Organizational" for exactly this reason. We also separate out personal names into givenName and familyName fields. These details are important for properly formatting metadata into a citation in one of the many citation styles. We support ROR (or other organizational identifiers) for names that are for organizations. DataCite uses a set of rules to "guess" whether an author is a person or organization. The most effective seems to be the list of common givenNames that we check against every author name on DOI registration. |
@mfenner thanks for the reminder about nameType. I see we have tests for it here:
However, these are used for a specific "export" format (OpenAIRE) rather than what Dataverse sends over the wire to DataCite. As @jggautier has noted at #2917 (comment) and #6492 (comment) we use your rules already in that export format already. Thanks! |
We are definitely in favor of implementing ROR in Dataverse. In a recent report (https://doi.org/10.29242/report.effectivedatapractices2020), the Association of Research Libraries (ARL) recommends wide adoption of these 5 core PIDs to power findability of research data, including ROR: @mcuthill already mentioned some fields in the Citation Metadata schema where ROR would fit in. Here is my list of relevant fields:
|
@philippconzett here is how we currently connect ROR IDs to DOIs at DataCite:
The relative numbers as of today are as follows: This is data on all DataCite DOIs and 8 million Crossref DOIs in DataCite Commons. Crossref doesn't yet support ROR IDs in their schema, but we can link ROR ID and DOI via the Crossref Funder ID in funding information. Affiliation is the classic use case for ROR, in addition we have a small number of DOIs with organizations as creator or contributor. But by far the largest number is hosted, DOIs in a repository run by particular organization identified by its ROR ID. This is of course one big reason why institutional repositories exist. For domain repositories that linkage is also useful, but with a different kind of information. For a repository that hosts content contributed by researchers from many different organizations, linking by affiliation is crucial. For the 273,601 DataCite DOIs with at least one ROR ID as affiliation identifier, more than 220K are in the "institutional repository" category. Dryad is currently the implementation in the domain repository category with the biggest uptake. When you look at a particular organization identified by ROR ID in DataCite Commons, e.g. UiT, you see these different sources aggregated in one place, e.g. Dryad datasets and publications from Crossref with funding: https://commons.datacite.org/ror.org/00wge5k78 Not yet all DOIs from DataverseNO, as this needs the new DataCite consortium organization structure to be in place to uniquely associate the repository with UiT. An organization where this transition has already happened is for example the University of Cambridge: https://commons.datacite.org/ror.org/013meh722 |
Thanks, @mfenner! This was useful information. And I guess the last section answers the question which I have had on my to-do-list since August 27, namely "Why are there only 64 records [as of 2020-08-27] for UiT in the DataCite Commons overview?" So, once the DataCite consortium organization structure is in place, the numbers for UiT will be more correct. But will these numbers be based on the fact the UiT is running DataverseNO? In that case, will all the datasets published by other partner institutions of DataverseNO, e.g. NTNU (https://ror.org/05xg72x27), UiB (https://ror.org/03zga2b32) etc., also be associated with UiT? In terms of the Dataverse metadata schema, I think the correct association would be through the metadata field producer, ideally through ROR. |
@philippconzett Mapping ROR ID and DOI via the repository as a "shortcut" only works reliably if it is an "institutional repository. It multiple institutions are behind a repository as I can see for DataverseNO for example at https://www.re3data.org/repository/r3d100012538, things get more complicated. The safest way is of course to add the ROR ID to every single DOI, but I would suggest to think about how this can also be done at the repository level in Dataverse, for example by defining "collections" for each repository partner institution. The "contributed" group in my visualizations above includes contributors with a ROR ID as nameIdentifier, and if you use that for example with contributorTypes "producer", it would work with DataCite Commons today without additional work needed on our end. You can see this for the California Digital Library in this query (where they use contributor type "producer" for data management plans, some very recent work where DataCite helped): https://commons.datacite.org/ror.org/03yrm5c26?query=contributors.contributorType%3AProducer |
@mfenner Once ROR support is in place in Dataverse, we will add RORs to each dataset. We would simply add these RORs in the dataset/metadata templates for each partner institution. The ROR will then automatically be included in the Producer field (and if necessary other fields, e.g. Author Affiliation) of each published dataset. You suggest we also should consider "defining "collections" for each repository partner institution". Each DataverseNO partner institution has already its own institutional collection (= sub-dataverse), e.g. UiB: https://dataverse.no/dataverse/uib. But currently, such collections do not get their own DOI in the Dataverse software. However, at request from a research group, DataverseNO has recently minted a collection DOI (through DataCite Fabrica) for a sub-sub-collection; see https://doi.org/10.18710/AJ4S-X394. Would minting such collection DOIs be helpful to associate datasets with organizations in DataCite Commons? |
If ROR IDs can be automatically included in the producer field, then maybe using collections is not needed. For repositories with content from multiple organizations, using ROR IDs per DOI is probably the "safest" way to associate content with an organization. Something that would help then, and we have heard this in other contexts, is the ability to "bulk update", so that this information can also be added retroactively without too much troiuble. |
This blog post may be of interest for the discussion in this issue thread: https://www.pidforum.org/t/organizational-identifier-adoption-in-datacite-metadata/1279. |
I just noticed that support for PIDs for institutions is set out as a desired characteristics in the COAR Community Framework for Good Practices in Repositories (https://doi.org/10.5281/zenodo.4110829); cf.:
|
Just to support the issue: we (University of Stuttgart) would also be very interested to have ROR-Ids integrated with all the affiliation fields (Author, Contact, Producer, Distributor), ideally in form of an external controlled vocabulary as the backend of a auto-fill-field with the label of the information visible for humans and the ROR-ID somewhere in behind and added to the DataCite-Metadata for getting a DOI. In our repository, we have several datasets with authors from different organizations, so it would really be good, if the ROR could be attached not only at the dataset-level, but on the author-affiliation-level. And we still need to attach an ORCID to the author, so it should really be an identification of th affiliation of an author/contact and not an identification of the author itself. |
Heidelberg University would also appreciate this. |
And +1 for ADA as well - we are looking at RORs for our CADRE project (https://cadre5safes.org.au/) |
With help from @Kris-LIBIS and the code in gdcc/dataverse-external-vocab-support#9 I was just able to search for "ucla" under Author Affiliation and see a list of organizations in ROR to select from. Here's a screenshot: |
@landreev recently configured https://demo.dataverse.org with the same external controlled vocabulary example: Author Affiliation can be populated from ROR. He put some nice screenshots at #8571 (comment) |
Great! Just tested it. Works fine. Would it make sense to expand the search configuration to include non-initial positions, so that when searching, e.g., for "California" you also would get results where "California" is in the midle or the end of the name, e.g., "University of California", "University of California, Berkeley"? |
@philippconzett That depends on the search API of ROR. But as far as I can tell from the docs and the screenshot above, that should already work. Please note that this has been a quick proof of concept implementation. The ROR search API only returns the first 20 results. In order to retrieve more, support for pagination should be added. Then again, you can narrow your search by entering multiple words like "berk* calif*". |
Thanks, @Kris-LIBIS! It seems that the pagination configuration was the reason why I didn't see relevant results when searching for, e.g., "California". I guess pagination would be a configurable feature? |
Priority:
|
Update:
|
I added this in the broader related issue at IQSS/dataverse-pm#19 and realized I should also mention here that in a Google Slide at https://docs.google.com/presentation/d/1PtqmEzAamuM2__V8psOIetgNODPQxjqSEOuxL3kAV-Y I've tried to summarize what support means and which types of metadata are and aren't supported in some way. I'm hoping this helps scope the work. |
Most recent update to this issue: NIH Task 2.5.3* | Task 2.5.3: Participate in GREI ROR Working Group and define and scope Dataverse ROR support (New for Year 2) | Proposed: Membership in ROR WG and document and related issues (e.g., #6640) describing how Dataverse will support ROR and technical work needed to provide this support |
Updated AIM labels to reflect relationship to Aim 2.5.3 rather than 1.5.1 and 1.5.2 |
Amanda French, Technical Community Manager for ROR, here. Just a note that I'm available to answer any questions you might have as you integrate ROR. And regarding the discussion from 2020 about individuals vs. institutions as authors, you might take a look at the slides at https://doi.org/10.5281/zenodo.8074996 where @zzacharo showed how InvenioRDM handles that in the interface. |
Thanks @amandafrench! Much appreciated! 🎉❤️ |
We'll be working on this as part of NIH-GREI funded work. @cmbz and I agreed to list this issue in IQSS/dataverse-pm#127, where related issues are listed, and close this GitHub issue. |
As a data steward for an organization producing and publishing data, we would like to see the Research Organization Registry ID option added to the Citation metadata block. Perhaps as an addition to the list of Identifier Schemes for authors, or attached to the Affiliation, Producer, Distributor, or similar. As can be seen here, a respectable list of supporters and signatories have already committed to the adoption and use of RORs going forward.
The text was updated successfully, but these errors were encountered: