Skip to content

How to Link Occurrences records or datasets to GRSciColl entries? Or vise versa

Salza Palpurina edited this page Feb 19, 2024 · 4 revisions

There are two ways to link occurrences to GRSciColl entries:

  • through institution and collection codes and identifiers
  • through the occurrence mapping

The GRSciColl lookup service is what is used to find and link GRSciColl collections and institutions to occurrences. Its algorithm is explained here: https://github.com/gbif/registry/blob/master/registry-service/README.md#grscicoll-lookup-explained. Anyone can use the lookup service to check institution and collection codes and identifiers in GRSciColl. During the occurrence interpretation, the system will use the publisher country to help choose a match in GRSciColl in cases where there are more than one candidate.

NB: Although the system attempts to link all the occurrences associated with institution and collection codes and identifiers, only specimen-related occurrences are flagged. Those are the occurrences with the basis of record: Preserved Specimen, Fossil Specimen, Living Specimen, Material Sample.

I. Linking occurrences to GRSciColl from the publisher side: institutionCode, collectionCode, institutionID, collectionID

This blogpost covers the issue as well as gives some background about GRSciColl: https://data-blog.gbif.org/post/grscicoll-flags/. Don't hesitate to forward it to data publishers.

Here is a part adapted from the blogpost mentioned above:

The way to get and exact match to GRSciColl is to use identifiers in the institutionID and collectionID fields. Here is how to do it:

  1. Check what kind of identifiers and codes are on the institution and collection pages.
  2. In the dataset:
    • Make sure that the values in the collectionCode and institutionCode fields correspond to codes or alternative codes on your institution and collection GRSciColl pages.
    • Make sure that the values in the collectionID and institutionID fields correspond to the identifiers on your institution and collection GRSciColl pages.
  3. Don't forget to push the changes to GBIF by publishing the changes you made in your dataset.

The GRSciColl pages might have zero or many identifiers. So which identifier should be choosen? We have an FAQ answering that question. Here is the relevant part:

There can be several identifiers to choose from, and GBIF recommends in priority order:

  1. An in-house generated LSID if available (for example: urn:lsid:biocol.org:col:34984),

  2. A GRSciColl ID or GRSciColl URI (for example: http://grscicoll.org/institution/south-african-institute-aquatic-biodiversity)

  3. If no others exist, please use the GBIF UUID from the page URL (for example: a90ba963-9569-4b96-8d56-452aa7b83f75 for the URL https://www.gbif.org/grscicoll/institution/a90ba963-9569-4b96-8d56-452aa7b83f75)

I am adding the screenshot below to illustrate where to find them. Where are the GRSciColl identifiers to choose

I also made a silent video to illustrate the steps I mentioned above.

NB: The version of GRSciColl that we use to interpret the occurrence record is refreshed once a week. This means that if you made any change in GRSciColl, you will have to wait a bit before it translates into a change in the occurrence interpretation.

Note that GBIF administrators can add default values to datasets. In this case, adding the relevant default values for the institution and collection identifiers, then reinterpreting the data will link the occurrences to GRSciColl.

II. Linking occurrences to GRSciColl from the API-savvy GRSciColl mediator/editor side: occurrenceMapping

The GRSciColl API will allow you to access and add occurrenceMapping. This is a way to formally link occurrences in a dataset to a GRSciColl collection and/or institution.

  • When you create a mapping, you can decide to map all the occurrences of a given dataset to a GRSciColl entry or a subset based on the collection and institution codes associated with the occurrences.
  • You can add as many mappings as necessary to a GRSciColl entry: an entry can be mapped with occurrences from several dataset.

A) Linking occurrences to institutions

1)Example of several datasets mapped to an institution:

B. Linking occurrences to collections

1) Mapping an entire dataset:

2) Mapping part of a dataset filtered by the collection code:

3) Mapping part of a dataset filtered by the institution code (i.e. the parent of the collection): parentCode:

See an example of how to use the API to generate occurrence mapping in python here: https://github.com/ManonGros/Small-scripts-using-GBIF-API/blob/master/map_occ_to_grscicoll.ipynb