Skip to content

Release tasks and task descriptions

Adam edited this page Oct 28, 2024 · 29 revisions

Release Task List

  • Triage All Issues

    • Manual review and correction of all issues
    • Automated triage of all issues
    • Review and validation of updates encoding for update records
    • Add labels to issues
    • Approve, deny, hold for processing, or request curator reviews
    • Assign issues to the correct project column
  • Release Candidates Identified

    • Review all issues for project status and label assignments
    • Extract release metadata for all issues in Ready for Sign-off/Metadata QA project column
    • Run pre-release tests on extracted metadata
      • Validate New Updates CSVs
      • Check for in-release duplicates
      • Validate New Updates CSVs
      • Address validation
      • Check for duplicate external IDs
      • Check for duplicate URLs
    • Conduct secondary review of all issues
    • Update issues based on second review and pre-release tests
      • Correct issue metadata
      • Add any necessary comments
      • Update column assignment in project
      • Close duplicate record issues
    • Perform bulk update in validation mode
  • Create Records

    • Create records with extracted metadata
    • Run post-creation tests on records
      • New records
        • Check JSON integrity
        • Check for duplicate values
        • Check for leading/trailing spaces
        • Check for unprintable characters
        • Check for production duplicates
        • Check for IDs on production
      • Update records
        • Check JSON integrity
        • Check for duplicate values
        • Check for leading/trailing spaces
        • Check for unprintable characters
    • Update issues based on post-creation tests
      • Correct issue metadata
      • Add any necessary comments
      • Update column assignment in project
      • Close duplicate record issues
    • Re-extract metadata once all issues have been updated/corrected
    • Create records with re-extracted/updated metadata
  • Pre-release prep

    • Create milestone and add all release issues to milestone
    • Commit records to ror-updates release branch
    • Create and validate relationships file
    • Commit relationships file to ror-updates release branch
  • Pre-Release Actions

    • Validate without relationships
    • Create relationships
    • Remove relationships to inactive records
    • Update labels in related records
    • Update addresses
    • Update last modified (set to release date)
    • Validate files with relationships
  • Release Notes

    • Create release notes
      • Add as draft pre-publication
  • Publish Release

    • Follow release instructions in ror-records
    • Merge ror-updates release branch to main
    • Publish release notes
  • Post-Release

    • Update release issues with ROR IDs and close
    • Close milestone
    • Archive release issues

Overview: Description of tasks

The following are the steps required to process issues and prepare them for release, including triage, metadata extraction, tests, record creation, and pre and post release activities.

Triage All Issues

Manual Review and Correction of All Issues

Metadata in each issue should be reviewed and corrected to conform with ROR’s metadata policies and issue formatting. All metadata in issues is user submitted and may be partial, incorrect, or otherwise in need of refinement.

Metadata from issues is extracted programmatically to create new and updated records. Although the metadata undergoes additional review at time of extraction, having it correctly represented in the issues in initial triage is important to reduce the time and overall work needed to prepare the release.

Triaging New Records

Formatting

Remove any extraneous text from the issue that is populated as a result of the form submission. Update any repeating fields to separate their values with semicolons vs. other forms of punctuation. For example, if a new record request has three aliases submitted, they should be represented in the aliases field as follows:

    alias_1; alias_2; alias_3

All name values should additionally be appended with an asterisk and the name of their language or the ISO 639-1 language code for each instance of a name. Refer to the LOC standard for identifying language codes. For example, if a record had both a Spanish and Japanese label, it would be represented in the labels field as follows:

    Spanish_label*es; Japanese_label*Japanese

Prefer use of the language code over language name. Language names have to be mapped to their language codes programatically on the basis of an exact name match, and so can create errors in release prep if tagged incorrectly (e.g. where a typo is made in the language name, such as Ukranian vs. the correct Ukrainian).

Name

Verify that the organization name provided in the name field corresponds to that for the organization and tag the corresponding language on the name. Refer to the ISO 639-1 standard for identifying language codes. This can be determined by checking the organization's site for title information or the name included in the copyright statement. Except where otherwise indicated in the request, assign to the name value the name in the language used by the organization on their site. If the name used by the organization is rendered in a non-Latin character script, add the non-Latin character value as a label in the request. If any other names appear on the organization's site outside the title or copyright sections, add these to the aliases field.

Company names are additionally appended with their country name in parentheses to disambiguate national-level manifestations, e.g.:

Company Name (United States)

If record is for the headquarters, additionally include the form without the country in the labels.

Status

All issues/records have the status active, unless otherwise indicated in the request. If a request indicates that is for an inactive organization or you are otherwise creating, add an additional status field to the issue with the corresponding inactive status, e.g.:

Status: inactive

ROR does not create new withdrawn records. This status is reserved for records created in error and only used in update requests.

Website

Verify that the organization's site resolves and the value provided is for the organization. Do not include links where the the provided value is for another organization, but in some way references the one submitted in the request.

Domain

Verify that any domain values provided occur on the organization's site. If no domain value was provided, determine if one can be inferred from the link or the organization's site. The domain should generally be referenced in the email address for the organization, but note that if the organization is hosted on a sub-domain page, the email address domain may be that for the parent-level page. Do not assign these value to the organization if their site is hosted on a sub-domain page.

Link to publications

Verify that the links provided resolve and demonstrate use of the organization's name in affiliation usage or funding acknowledgements

Organization type

Verify that the type provided is correct for the organization. Refer to the types section of ROR's metadata policies and the Guidance for evaluating common organization types if any clarification is needed.

Wikipedia page

Verify that the Wikipedia page is that for the organization. Prefer the page in the language used by the organization, unless another page is more detailed or complete. Use the standard site vs. mobile page. Remove the Wikipedia link if the provided page is a sub-heading or section of another page.

External Identifiers

Verify that all external identifiers provided belong to the organization by checking their corresponding site or API entries:

Remove any URL formatting from the identifiers. For ISNI identifiers, format as four digits/characters, separated by an individual space, e.g. 0000 0005 1090 3649

Aliases, labels, and acronyms

Verify that the aliases, labels, and acronyms provided occur on the organization's site or are included in affiliation usage. Pay attention to the assignment of name values in the request and reassign to the correct fields as needed. Add languages for all. Refer to the LOC standard for identifying language codes.

Relationships

Relationships should be represented in the relationships field using the following pattern: ror_id (relationship_type). There is no need to separate repeating instances of the relationships with semicolons, but each must be followed by the relationship type value in order to be extracted. For example, record for which three relationships needed to be added would be coded in the relationships field as follows:

    https://ror.org/000000001 (parent) https://ror.org/000000002 (child) https://ror.org/000000003 (related)

For assigning the correct relationship type, refer to the relationship types section of ROR metadata polcies

Where names of organizations are used instead of their ROR IDs, search and determine whether the organization exists in ROR. If the organization does not exist in ROR and appears to be otherwise in scope, create a new record request for the missing organization. Tag the relationship with the issue number in either the new record request or original issue referencing the relationships, e.g.:

#12345 (child)

Note the relationship between the new records in your personal notes document for the release.

City and Country

Verify that the city and country indicated in the request are correct for the organization. Check that the locations indicated are used on the organization's site or in other authoritative sources.

Geonames

The Geonames ID will typically not be provided in the request and needs to be returned using the automated triage process or by searching the Geonames site. If a values is provided, confirm that it corresponds to the value indicated in the request.

Year established

Confirm via the organization's site.

How will a ROR ID for this organization be used? and Other information about this request

Review these fields for any comments that may impact the metadata or curation of the request.

Triaging Update Records

Formatting

Remove any extraneous text from the issue that is populated as a result of the form submission.

Name and ROR ID

Verify that the ROR ID provided is for the organization indicated in the request. The name value may differ from the ror_display value, but so long as it is otherwise correct for the organization, this does not need to be changed.

Which part of the record needs to be changed?

Review for a general framing in triaging the update. These values do not need to be changed, unless so inaccurate or unclear that they impede understanding of the request.

Which part of the record needs to be changed?

Begin by carefully reviewing the request and comparing them to the existing record to identify the specific changes requested. Next, consider whether any additional metadata might need updating as a consequence of the request. For example, changing an organization's name might also require updating its acronyms, aliases, or labels. Similarly, a request to update only the URL might reveal that the organization name is also out of date. Throughout this process, refer to ROR's metadata polices to ensure all changes align with them.

Identify if any of the requested changes that are inconsistent with ROR's curation policies and flag them in a comment. The most common example of this for updates are requests to remove "unofficial names" from a record. ROR faciliates the matching of variant names to their primary or official forms by inclusion of aliases on its records, so these should not be removed. If requested, explain this to the requestor in a comment and link to our blog post on name metadata.

Additionally assess whether the proposed changes could impact other records. If other records are affected, file new issues to reflect these changes.

Encoding updates

Updates to records are encoded with a special syntax that describes the changes. This encoding is generated by the automated triaging, but must be validated by the curation lead and can alternately be supplied by them alone. This encoding begins with an "Update:" field, followed by changes to specified fields separated by vertical bars (|), and terminated with a "$". Each change follows this structure:

field.operation==value

Where:

  • field is the name of the field to be updated
  • operation is one of: add, delete, replace, or delete_field
  • value is the new or modified data (omitted for delete_field)

Operations:

  1. "add" - Adds the specified value to a repeating field
  2. "delete" - Removes the specified value from a repeating field
  3. "replace" - Replaces all existing data in a field with the supplied value (used for non-repeating fields or to completely overwrite repeating fields)
  4. "delete_field" - Removes all data in the field, rendering it empty (no value is specified)

Field Categories:

  • Non-repeating fields (can only use replace operation): 'status', 'established', 'grid.preferred', 'isni.preferred', 'wikidata.preferred'
  • Repeating fields (can use add, delete, replace, or delete_field operations): 'acronym', 'alias', 'label', 'ror_display', 'types', 'domains', 'geonames', 'fundref.all', 'fundref.preferred', 'grid.all', 'isni.all', 'wikidata.all', 'website', 'wikipedia'

Updates to fields not included in the above lists will be ignored on extraction.

Example: To change an organization's name, delete an alias, remove the Wikipedia URL, add a label, and add a new preferred ISNI:

Update: ror_display.replace==New Name*en | label.delete==Old Name*en | alias.add==Old Name*en | label.add==New Name*en | isni.preferred.replace==ISNI_ID | isni.all.add==ISNI_ID | wikipedia.delete_field$

Special Considerations:

  1. Name values: When adding or deleting name values, append an asterisk and the ISO 639-1 language code for each instance. Example: label.add==New Name*en. If the language associated with a name value is not included, the update to this value will not be processed correctly.
  2. ror_display: If changed, update both the ror_display and the record's labels. Add the original name as an alias, as appropriate.
  3. locations: Encode location changes using geonames. Example: geonames.replace==GeonamesID
  4. External IDs:
    • If not existing, add to both preferred and all fields
    • If existing, add to all field
      • To assert a new preferred value, replace preferred and add to all

Automated Triage of All Issues

Except for issues where updates are manually encoded or where only relationships are being updated, all requests should be triage with the automated triage script.

New Organization Requests

For each new organization request, the script generates a comment on the issue with the following information:

  • Wikidata: Name, ID, and similarity score for the matched name (if found)
  • ISNI: Matched ID(s) and name(s) retrieved from the ISNI API
  • Funder ID: Matched Crossref Funder Registry ID returned from the Crossref API(if found)
  • Publication affiliation usage: DOIs where the affiliation string contains the organization names provided in the request. Retrieved from the OpenAlex API.
  • ORCID affiliation usage: ORCID IDs where the organization name is listed as the affiliation
  • Possible ROR matches: Existing ROR IDs and names that are pot. Used to identify records that already exist in ROR
  • Previous requests: Links to GitHub issues where the same organization is named
  • Geonames match: Name and Geonames ID of matched location returned from the Geonames API

The results of the automated triage should then be verified for correctness and reconciled back with the main issue body. Individual fields that fail to return anything from their corresponding API queries will be absent from the comment.

The publication and ORCID affiliation usage should be used to help assess whether the record is in scope for ROR. However, do not rely exclusively on what is returned from the script to determine evidence of affiliation usage. If no affiliation usage is returned by the script, check additional sources like Google Scholar to identify whether affiliation usage exists that is not otherwise indexed in the DOI metadata or in OpenAlex.

Update Requests

For update requests, the script generates an encoded update string using the record identified in the request and the description of change, created through an OpenAPI request using the updates encoding prompt, with some additional procedural validation. This results in an update string like the following:

Update: ror_display.replace==New Organization Name | label.add==New Organization Name | alias.add==Old Organization Name | isni.add==0000 0001 2345 6789$

This encoded update is added as a comment on the GitHub issue for review.

Review and Validation of Updates Encoding for Update Records

Although the automated triaging can generally handle update requests of simple to moderate complexity, it can make mistakes, skip over data, or introduce other forms of errors and often fails for complex or ambiguous requests. The updates encoding from the the automated triage should thus not be used without additional review.

Review the encoding relative to the record and description of change to verify its correctness and completeness. Verify that the update will not result in any unnecessary data loss (e.g. where a field's values are being errantly replaced, vs. added or deleted). Check for any issues relative to the special consideration in the encoding updates section. Make sure that the correct languages are assigned for name values. Make sure any additional update required for the record, but not identified in the request are included in the encoding as well.

Add Labels to Issues

All requests should have the appropriate labels assigned to indicate their type, character, and complexity.

Label Description
lion High-complexity issue
jaguar Medium-complexity issue
kitten Low-complexity issue
level 1 Higher priority (primarily new record requests)
level 2 Medium priority (primarily metadata changes to principal fields for discovery and disambiguation)
level 3 Lower priority (all other metadata changes)
already in ror ROR ID already exists
duplicate This issue or pull request already exists
hold for later To be processed at a later point
merge records Two or more records need to be merged
split record Split record into one or more records
new record Add a new ROR record
update record Update an existing ROR record
cleanup Cleanup work to fix data issues involving a high volume of records
needs discussion Issue requires a policy-related discussion or decision
non-request A general question or comment as opposed to a specific request
out of scope Not in scope for ROR
org-requested This request came from the organization in question
project A longer-term and larger-scale curation task, typically involving bulk updates to a set of records
training Issue useful for training
triage needed Request needs to be triaged by curation lead

Approve, Deny, Hold for Processing, or Request Curator Reviews

Use the curator evaluations workflows for new record requests and updates to existing records to assess all requests. Once reviewed, provide a comment approving, denying, requesting an additional review by a curator, or indicating that the request will be put on hold for additional review, contact with the requestor, or until further evidence of meeting ROR's criteria for inclusion are met.

Assign Issues to the Correct Project Column

Once each request has been triaged, reviewed, and labeled, assign to the appropriate project column.

Label Description
To do (ready for review) Issues here are a holding pen for work in progress
In Progress Issues here are in progress. Primarily bulk submissions and other project issues.
Second Review Issues here require additional review by a curation team member
Needs discussion Issues here require further team discussion and potential consultation with requestors
Ready for sign-off / metadata QA Requests here are approved and ready for metadata prep
Approved QA complete on metadata and approved, but not yet moved into a release
Ready for production release Ready to be included in the next release
Done (Released on Production) Issue has been released on production
Declined requests Requests declined because they are (1) out of scope, (2) duplicate an existing request, or (3) duplicate information already in ROR
Hold for later These requests cannot yet be processed due to insufficient information or incomplete functionality
Projects These are projects involving bulk analysis/bulk processing of sets of records
Cleanup Cleanup work that is needed to fix data issues involving a high volume of records across the registry

Release Candidates Identified

Review All Issues for Project Status and Label Assignments

Verify that all records that have been triaged are part of the project and are assigned to the correct project column. This can be accomplished with an issue search for issues that do not have a triage label and are not assigned to the project:

  • is:issue is:open -project:ror-community/19 -label:"triage needed"
    • is:issue: Filters for issues (not pull requests).
    • is:open: Filters for issues that are still open.
    • -project:ror-community/19: Excludes issues assigned to project 19 of the "ror-community" repository.
    • -label:"triage": Filters for issues without "triage" label.

Similarly, check for any missing or mixed-up labels for new and update record requests. These are used to identify and extract the corresponding metadata, so they need to be correctly assigned. This can be accomplished by searching for misaligned issues and title text:

  • is:issue is:open label:"new record" "Modify the"

    • is:issue: Filters for issues.
    • is:open: Filters for open issues.
    • label:"new record": Filters for issues with the "new record" label.
    • "Modify the": Filters for issues where the title contains the text "Modify the."
  • is:issue is:open label:"update record" "Add a new"

    • is:issue: Filters for issues.
    • is:open: Filters for open issues.
    • label:"update record": Filters for issues with the "update record" label.
    • "Add a new": Filters for issues where the title contains the text "Add a new."

Records without new or update record tags can be seen in the ROR updates project by switching to the table view and applying the following filter:

  • status:"Ready for sign-off / metadata QA" -label:"new record" -label:"update record"
    • status:"Ready for sign-off / metadata QA": Filters issues with the status "Ready for sign-off / metadata QA."
    • -label:"new record": Excludes issues with the "new record" label.
    • -label:"update record": Excludes issues with the "update record" label.

Extract Release Metadata for All Issues in Ready for Sign-off/Metadata QA Project Column

Use the script for extracting record metadata from issues to create the new and update records files from the issues in the Ready for Sign-off/Metadata QA column.

Here's the reorganized and improved version of the section:

Pre-Release Testing and Review

Run Pre-Release Tests on Extracted Metadata

Run the following tests on the extracted metadata, using the instructions in their corresponding READMEs:

New and Update records

New records

Correct any errors in the issue, repeating extraction and tests until all problems have been addressed.

Conduct Secondary Review of All Issues

Once all tests are passing, perform a manual review of all issues in the extracted metadata, scanning for errors. Refer to the new records and update records processing section for guidance.

Update Issues Based on Second Review and Pre-release Tests

After completing the pre-release tests and secondary review, update the issues to reflect any changes:

  • Update issues with corrections based on test results and secondary review
  • Document any changes or reasons for corrections as comments in the issues where additional explanation is needed
  • Reassign issues to appropriate columns based on their updated status
  • Close any issues identified as duplicates and remove them from the project

Create Records

It is generally better to create the new and update records in separate batches. Update records generally require less tests and are less complex, so prioritize creating those first, followed by new records, if possible.

Create records via the API

Using the extracted metadata files, create the release records via the API using the create_records script.

For new records, you will need to reconcile the ROR IDs back into the input CSV file. This can be done by copying the ROR ID values in the report.csv file that is returned as part of the API response zip into input.csv.

Rename the input files with the date and type, using the pattern {date}_{record_type}_records_metadata.csv, e.g. 20241017_new_records_metadata_csv.

Create Release Branch and Commit Records

Create a release branch off the main branch in ror-updates using the pattern rc-v{release_version}{release_number}, e.g. rc-v1.54. In this branch create three directories: new, updates, and input_files. Add the records to the directory corresponding to their type and the CSV file used to generate them to the input_files directory. Commit to the branch with a basic commit message summarizing the action, e.g. "Adding all new records through 2024/10/17 for release v1.54."

Run Post-Creation Tests on Records

Run the following tests on records created using the instructions in their corresponding READMEs:

Tests to Run

If not first ran on new record CSV input files

For each test, review output, make corrections either directly in files or through re-extraction/creation, delete test results, and commit changes.

Update Issues and Input CSV files After Tests

After completing the test, update and issues to reflect any changes by:

  • Updating issues and CSV files with test-based corrections, such that they are consistent with any changes made in files.
  • Documenting any necessary context for any changes in issue comments
  • Reassigning issues to the appropriate columns
  • Closing issues and removing from the project
  • Deleting rows from the input CSV files for any records that have been dropped from the release.

Move Issues to Ready for Production Release

Once all tests have been completed, use the move issues script to move the release issues to the Ready for Production Release column.

Pre-release prep

Create Milestone and Add All Release Issues to Milestone

Create a milestone in ror-updates, naming it with the release number (e.g v1.55). No due date or description is required. Note or copy the milestone number and use it in the script to add all issues to the milestone.

Add ROR IDs to requests that reference new records

For requests that reference new records in their relationships field, add the corresponding ROR IDs to the requests that reference them from the new records that were generated, replacing the issue number references. Use the new records relationship label to identify or additionally search and filter with the scripts for finding text in issues.

Create and Validate Relationships File

Once all requests have been updated with the new records ROR IDs, generate a CSV file all of all names and ROR IDs in the release directory using the script for obtaining all of these values. Then, proceed to use this as input to the create relationships script to generate a relationships CSV from the issues in the Ready for Production Release column (which the script references by default). Review the resulting relationship CSV file for any errors (identified by an error in the relationship type column) and spot check several relationship entries against their corresponding issues. Once reviewed, commit the relationships CSV to the release directory in the release branch.

Pre-Release Actions

Once all files are committed to the release branch, run the following Github actions in this order

If any errors are encountered when running each action, review and update files to correct for these issues, with a corresponding commit, and re-run until successful.

Create Release Notes

Once all actions have run successfully, pull down the updated and additional record files to your local branch and run the create release notes script on the release directory. Add the resulting release notes to ror-updates as a draft branch for publication after the release has been deployed.

During Release

Once all have release tests have passed, but prior to deploying to the data dump to Zenodo, merge the release release branch to main and publish the draft release notes.

Post-Release

Update release issues with ROR IDs and close

Once the release is fully deployed and published to Zenodo, use the close issues script with the release inputs files as input to add comments to all release issues and close.

Close milestone

Verify that all release issues have been closed (listed in the milestone description), update and close any not missed by the close issues script, then close the milestone.

Archive release issues

Once all issue have been updated with the release comments and closed, archive the release issues in the project.

Clone this wiki locally