Skip to content

The Workflow of Refining GRSciColl Information

Salza Palpurina edited this page Sep 26, 2024 · 12 revisions

The text here follows the task structure of each issue.

  1. Find out if the information is complete and up to date:
  • Check the homepage if available or google the institution name in the country. Check if all collections for that institution are represented in GRSciColl.

    1. GBIF data publisher and GRSciColl institution are the same

      ➡︎ Consider adding the GBIF data publisher page as a master source for the GRSciColl institution by adding its UUID. See an example here. (NOTE: the fields Name, Description, Active, Homepage, Phone, email, Latitude, Longitude, Logo URL and Physical address will be taken from the master record and won't be editable anymore)

    2. GBIF data publisher and GRSciColl institution appear as the same organization but at different levels

      ➡︎ Don't need to update unless the instituion asks.

    3. Fuzzy match tagged a different GBIF data publisher

      ➡︎ comment that the institution doesn’t have any registered publisher (and not put the “registered publisher” label).

    4. Fuzzy match tagged more than one GBIF data publishers

      ➡︎ No need to split the GRSciColl institution to match GBIF data publisher unless the instition asks. Update the GRSciColl entry to include information from both data publishers. Make sure occurrence records are properly linked. See 1.vi and How to link.

    5. A new GRSciColl institution has been added, or encourtered an institution entry missing on GitHub

      ➡︎ Use the Registry link to create a github issue (contact @ManonGros or use the following Python script: https://github.com/ManonGros/Small-scripts-using-GBIF-API/blob/master/create_github_issues_grscicoll_inst.ipynb)

    6. An institution entry appears to be a collection

      ➡︎ Use "⠇More → Convert to collection" and assign it to the hosting insitution by editing collection information.

  • Add missing collections/information to the institution on GRSciColl directly in the registry. If some collections are digitized, put the information in the GRSciColl Notes field.. HINT 1: Look also for the combination of the words '_collection _+ 'institution name' in Google Scholar. Those digitised are often published. HINT 2: Check also the the corresponding IH entry. In the Collections Summary tab there is sometimes an additional info on associated institutions such as botanical gardens. In the tab Staff, one may find emails of additional contacts.

    1. Collection datasets from different GBIF data publishers under an GRSciColl institution

      ➡︎ Create corresponding collections under the (updated) GRSciColl institution.

    2. Collection information is not found online but exists in GRSciColl ➡︎ collection missing from GBIF AND/OR pending collection verification.

    3. Collections seem exist but is unclear what the Collection Code should be

      ➡︎ If there is only one, it's okay to use Institution Code

      ➡︎ If there are more than one, it's okay to invent for temporary use until further communication with the institution.

    4. In general, merging is better than deleting as it preserves the change of entity relationship.

    5. The link between datasets and collections can be made at the record level (based on the collection and institution codes, and identifiers). As a result, one (big) dataset can refer to many (smaller) collections, and several publishers can publish specimens from the same collection.

  1. Check if the data is also in GBIF:
  • If there are GBIF occurrence records linked, check from which dataset/publisher they come (add comments to the issue).
    • Is the institution a registered publisher?
    • Or do the records come from a third party publisher?
    • Are all the collections in GRSciColl also in GBIF? 11. If occurrences come from non-GRSciColl institution, like those from NCBI or Plazi ➡︎ some specimens in GBIF AND published by 3rd party. 12. published by 3rd party for even just a few records. 13. Principles of handling with IH synced entries: 1. When we disconnect GRSciColl institution from its IH master source, the subsequent sync will only update the "herbarium" collection; 2. Collection level information is advised to let the institution to update through IH.
  • If no record is linked to GRSciColl, look for the institution name on the GBIF list of publishers (add comments to the issue).
    • Is there any corresponding publisher?
    • Have they published any data?
  • If data has been published on GBIF but isn’t linked to GRSciColl, notify Marie (tag ManonGros), she can link the data.
  • Translate outcome of your checks into labels. See guidelines here: https://github.com/gbif/collection-mobilization#readme
    • Add the registered publisher label if the institution is registered on GBIF.
    • Add the some specimens in GBIF label if the institution (or a third party) has already published some of the institution's specimens on GBIF.
      • Add the published by third party label if the institution has datasets that published by a third party.
    • Add the outside of GBIF scope label if the institution has only collections that wouldn't fit in GBIF (like geology or archeology collections)
    • Add the pending collection verification label if initial search shows no indication of collection curation.
    • Add the collection missing from GBIF label if some of the collections aren't available on GBIF yet.
    • Add the some digitised specimens-metadata missing from GBIF label if the institution has somthing almost like occurrences (scientific name, location, date) available like a database or online inventory for their specimens.
      • Add the media available label if the institutions has specimen-related media or is in the process of getting media (e.g. capturing images).