-
Notifications
You must be signed in to change notification settings - Fork 0
How to Link Occurrences records or datasets to GRSciColl entries? Or vise versa
- through institution and collection codes and identifiers
- through the occurrence mapping
The GRSciColl lookup service is what is used to find and link GRSciColl collections and institutions to occurrences. Its algorithm is explained here: https://github.com/gbif/registry/blob/master/registry-service/README.md#grscicoll-lookup-explained. Anyone can use the lookup service to check institution and collection codes and identifiers in GRSciColl. During the occurrence interpretation, the system will use the publisher country to help choose a match in GRSciColl in cases where there are more than one candidate.
NB: Although the system attempts to link all the occurrences associated with institution and collection codes and identifiers, only specimen-related occurrences are flagged. Those are the occurrences with the basis of record: Preserved Specimen
, Fossil Specimen
, Living Specimen
, Material Sample
.
I. Linking occurrences to GRSciColl from the publisher side: institutionCode
, collectionCode
, institutionID
, collectionID
This blogpost covers the issue as well as gives some background about GRSciColl: https://data-blog.gbif.org/post/grscicoll-flags/. Don't hesitate to forward it to data publishers.
Here is a part adapted from the blogpost mentioned above:
The way to get and exact match to GRSciColl is to use identifiers in the
institutionID
andcollectionID
fields. Here is how to do it:
- Check what kind of identifiers and codes are on the institution and collection pages.
- In the dataset:
- Make sure that the values in the
collectionCode
andinstitutionCode
fields correspond to codes or alternative codes on your institution and collection GRSciColl pages.- Make sure that the values in the
collectionID
andinstitutionID
fields correspond to the identifiers on your institution and collection GRSciColl pages.- Don't forget to push the changes to GBIF by publishing the changes you made in your dataset.
The GRSciColl pages might have zero or many identifiers. So which identifier should be choosen? We have an FAQ answering that question. Here is the relevant part:
There can be several identifiers to choose from, and GBIF recommends in priority order:
An in-house generated LSID if available (for example: urn:lsid:biocol.org:col:34984),
A GRSciColl ID or GRSciColl URI (for example: http://grscicoll.org/institution/south-african-institute-aquatic-biodiversity)
If no others exist, please use the GBIF UUID from the page URL (for example: a90ba963-9569-4b96-8d56-452aa7b83f75 for the URL https://www.gbif.org/grscicoll/institution/a90ba963-9569-4b96-8d56-452aa7b83f75)
I am adding the screenshot below to illustrate where to find them.
I also made a silent video to illustrate the steps I mentioned above.
NB: The version of GRSciColl that we use to interpret the occurrence record is refreshed once a week. This means that if you made any change in GRSciColl, you will have to wait a bit before it translates into a change in the occurrence interpretation.
Note that GBIF administrators can add default values to datasets. In this case, adding the relevant default values for the institution and collection identifiers, then reinterpreting the data will link the occurrences to GRSciColl.
II. Linking occurrences to GRSciColl from the API-savvy GRSciColl mediator/editor side: occurrenceMapping
The GRSciColl API will allow you to access and add occurrenceMapping
. This is a way to formally link occurrences in a dataset to a GRSciColl collection and/or institution.
- When you create a mapping, you can decide to map all the occurrences of a given dataset to a GRSciColl entry or a subset based on the collection and institution codes associated with the occurrences.
- You can add as many mappings as necessary to a GRSciColl entry: an entry can be mapped with occurrences from several dataset.
- The institution concerned: https://www.gbif.org/grscicoll/institution/241f1116-604b-4815-a615-5375ba9fc9ef
- The mapping information of the institution can be accessed via the API: http://api.gbif.org/v1/grscicoll/institution/241f1116-604b-4815-a615-5375ba9fc9ef/occurrenceMapping
- In this case, you can see 4 mappings - all of them are partial. Datasets with given keys (
datasetKey
) are filtered by thecode
provided, which corresponds to the institution code in this case. - For example, in mapping #143, from the dataset https://www.gbif.org/dataset/60bd5746-f0cc-4a32-9cfd-a3c76e4f2b72 only occurrences with institution code =
IBER
are mapped to the institution: https://www.gbif.org/occurrence/search?dataset_key=60bd5746-f0cc-4a32-9cfd-a3c76e4f2b72&institution_code=iber&advanced=1 - You can see that because there is
code
in the mapping, i.e. theoccurrenceMapping
posted with the API was{ "code": "IBER" "datasetKey": "60bd5746-f0cc-4a32-9cfd-a3c76e4f2b72" }
- In this case, you can see 4 mappings - all of them are partial. Datasets with given keys (
- What the lookup service returns when querying for the datasetKey combined with the institution code: https://api.gbif.org/v1/grscicoll/lookup?institutionCode=IBER&datasetKey=60bd5746-f0cc-4a32-9cfd-a3c76e4f2b72
- An example of occurrence linked as the result of the mapping: https://www.gbif.org/occurrence/3412613301 the match is exact (because of the mapping) although the lookup service when querying for the institution code
IBER
matches a different institution: https://api.gbif.org/v1/grscicoll/lookup?institutionCode=IBER.
- The collection concerned: https://www.gbif.org/grscicoll/collection/dbc640f7-0584-4a31-8e90-5eaea7cf0956
- The mapping information of the collection can be accessed via the API: https://api.gbif.org/v1/grscicoll/collection/7cd4211d-6944-49f4-86b6-1c9123e8c55b/occurrenceMapping.
- In this case, all the occurrences associated with the dataset https://www.gbif.org/dataset/b5cdf587-3342-48ec-9130-ba1281d7166f are mapped to the collection.
- You can see that because there is no
code
in the mapping, theoccurrenceMapping
posted with the API was{ "datasetKey": "71d0dff0-f762-11e1-a439-00145eb45e9a" }
- What the lookup service returns when querying for the datasetKey combined with the collection and institution codes: https://api.gbif.org/v1/grscicoll/lookup?institutionCode=MNHN&collectionCode=IK&datasetKey=b5cdf587-3342-48ec-9130-ba1281d7166f
- An example of occurrence linked as the result of the mapping: https://www.gbif.org/occurrence/3497363492 the match is exact although no collection identifier is provided because of the mapping.
- The collection concerned: https://www.gbif.org/grscicoll/collection/56d07a8b-e5be-45f2-80b3-f73254e92cd4
- The mapping information of the collection can be accessed via the API: http://api.gbif.org/v1/grscicoll/collection/56d07a8b-e5be-45f2-80b3-f73254e92cd4/occurrenceMapping
- In this case, there are 7 mappings, of which #149 and #148 are mappings of parts of two different datasets filtered by the code
IBER
, which is the collection code in this case. - For example, in mapping #149, the occurrences associated with the dataset https://www.gbif.org/dataset/56d07a8b-e5be-45f2-80b3-f73254e92cd4 that have collection code
IBER
are mapped to the collection: - You can see that because there is
code
in the mapping, theoccurrenceMapping
posted with the API was{ "code" : "IBER" "datasetKey": "50e48d61-aaca-4af9-9ef3-d2104aba3b8b" }
- In this case, there are 7 mappings, of which #149 and #148 are mappings of parts of two different datasets filtered by the code
- What the lookup service returns when querying for the datasetKey combined with the institution code: https://api.gbif.org/v1/grscicoll/lookup?institutionCode=IBER&collectionCode=IBER&datasetKey=50e48d61-aaca-4af9-9ef3-d2104aba3b8b (NOTE: For this to work properly, the dataset should also be mapped to the institution. See previous point).
- An example of occurrence linked as the result of the mapping: https://www.gbif.org/occurrence/3866283305. Thanks to the mapping, the match is exact although no collection identifier is provided in the dataset.
3) Mapping part of a dataset filtered by the institution code (i.e. the parent of the collection): parentCode
:
-
The collection concerned: https://www.gbif.org/grscicoll/collection/514ab348-f935-4e54-8b94-a9b9b85e67d7
-
The mapping information of the collection can be accessed via the API: https://api.gbif.org/v1/grscicoll/collection/514ab348-f935-4e54-8b94-a9b9b85e67d7/occurrenceMapping
- One can see 2 mappings: #174 and #175
- In the case of mapping #175, only occurrences associated with dataset https://www.gbif.org/dataset/bf3f09bd-3af6-45be-a2c4-bd5c285cab8a and institution code
SOMF
are mapped to the collection: https://www.gbif.org/occurrence/charts?dataset_key=bf3f09bd-3af6-45be-a2c4-bd5c285cab8a&institution_code=SOMF - You can see that because the
code
in the mapping isSOMF
, theoccurrenceMapping
posted with the API was{ "parentCode" : "SOMF" "datasetKey": "bf3f09bd-3af6-45be-a2c4-bd5c285cab8a" }
-
What the lookup service returns when querying for the datasetKey combined with the collection and institution codes: https://api.gbif.org/v1/grscicoll/lookup?institutionCode=SOMF&collectionCode=SOMF&datasetKey=bf3f09bd-3af6-45be-a2c4-bd5c285cab8a
-
An example of occurrence linked as the result of the mapping: https://www.gbif.org/occurrence/4404461436. Thanks to the mapping, the match is exact even though no collection identifier is provided.
See an example of how to use the API to generate occurrence mapping in python here: https://github.com/ManonGros/Small-scripts-using-GBIF-API/blob/master/map_occ_to_grscicoll.ipynb