-
Notifications
You must be signed in to change notification settings - Fork 8
Institutions and Collections
Please explain and provide examples for institutionCode, collectionCode, datasetName, institutionID, collectionID, and datasetID
The original question posed to the DwC Hour Team
How is the term "collectionCode" supposed to be used? Are there any existing standards recommendations?
presents an opportunity to clarify collectionCode and related terms.
When we share data, or metadata, we need ways to share exactly what we are referring to so that others can understand our data and figure out where to go to find out more. Machines need globally unique identifiers for certain fields - to make it possible for computer programs to link data.
Darwin Core provides several terms to help people (and machines) distinguish data sets: namely institutionCode, collectionCode, datasetName, and their related identifiers institutionID, collectionID, and datasetID. The
institutionCode
is meant to hold the official acronym for an organization, such as "MVZ" for the institution "Museum of Vertebrate Zoology".
This acronym, along with a catalog number, is commonly used to identify cataloged material in scientific publications.
Practices vary within and among institutions in terms of how cataloging is done, and how specimens are identified. In one institution, the catalog number might contain information to designate which collection in that institution the specimen belongs to, for example "Herp 2371", while in another, the catalog number might not contain this information, for example, "2371". The collectionCode
is meant to allow specimens in institutions that follow the latter practice to distinguish specimens from different collections within that institution when sharing with the rest of the world. Thus, institutionCode
= "MVZ", collectionCode
= "Herp", catalogNumber
= "2371" is sufficient to identify the specimen of interest from among many at the Museum of Vertebrate Zoology with catalog number "2371".
collectionCode
when used together with,institutionCode
andcatalogNumber
can uniquely identify a specimen in a given collection.
The datasetName
allows institutions to further separate subsets of data, or to name them explicitly. For example, the University of British Columbia Beaty Biodiversity Museum (institutionCode
= "UBCBBM") has the Cowan Tetrapod Collection (collectionCode
= "CTC"), within which are several distinct data sets, including one with datasetName
= "Cowan Tetrapod Collection - Avian". As another example, the University of Kansas (institutionCode
= "KU") has a herpetological collection (collectionCode
= "KUH") as a single data set, the name of which is spelled out in datasetName
= "University of Kansas Biodiversity Institute Herpetology Collection".
The corresponding identifier fields institutionID, collectionID, and datasetID are meant to contain globally unique and persistent identifiers for the three corresponding concepts. The first two of these terms, institutionID
and collectionID
would best be populated with references to entries in a registry of institutions and collections, such as the Global Registry of Biodiversity Repositories (http://grbio.org), for example, institutionCode
= "NHMO", institutionID
= "http://grbio.org/cool/2knt-7f1r", collectionCode
= "BI", collectionID
= "http://grbio.org/cool/wes0-t2ie".
The datasetID
is best populated with an identifier for a published data set in which the record can be found. As such, a publication reference such as a Digital Object Identifier (DOI) is a good candidate, for example datasetID
= "https://doi.org/10.15468/aomfnb" for records in the 2015 eBird Observation Dataset (see http://www.gbif.org/dataset/4fa7b334-ce0d-4e88-aaae-2e0c138d049e).