Skip to content

Institutions and Collections

Debbie Paul edited this page Sep 5, 2017 · 2 revisions

Please explain and provide examples for institutionCode, collectionCode, datasetName, institutionID, collectionID, and datasetID

The original question posed to the DwC Hour Team

How is the term "collectionCode" supposed to be used? Are there any existing standards recommendations?

presents an opportunity to clarify collectionCode and related terms.

Distinguishing Data sets

When we share data, or metadata, we need ways to share exactly what we are referring to so that others can understand our data and figure out where to go to find out more. Machines need globally unique identifiers for certain fields - to make it possible for computer programs to link data.

institutionCode and collectionCode

Darwin Core provides several terms to help people (and machines) distinguish data sets: namely institutionCode, collectionCode, datasetName, and their related identifiers institutionID, collectionID, and datasetID. The

institutionCode is meant to hold the official acronym for an organization, such as "MVZ" for the institution "Museum of Vertebrate Zoology".

This acronym, along with a catalog number, is commonly used to identify cataloged material in scientific publications.

Practices vary within and among institutions in terms of how cataloging is done, and how specimens are identified. In one institution, the catalog number might contain information to designate which collection in that institution the specimen belongs to, for example "Herp 2371", while in another, the catalog number might not contain this information, for example, "2371". The collectionCode is meant to allow specimens in institutions that follow the latter practice to distinguish specimens from different collections within that institution when sharing with the rest of the world. Thus, institutionCode = "MVZ", collectionCode = "Herp", catalogNumber = "2371" is sufficient to identify the specimen of interest from among many at the Museum of Vertebrate Zoology with catalog number "2371".

collectionCode when used together with, institutionCode and catalogNumber can uniquely identify a specimen in a given collection.

datasetName

The datasetName allows institutions to further separate subsets of data, or to name them explicitly. For example, the University of British Columbia Beaty Biodiversity Museum (institutionCode = "UBCBBM") has the Cowan Tetrapod Collection (collectionCode = "CTC"), within which are several distinct data sets, including one with datasetName = "Cowan Tetrapod Collection - Avian". As another example, the University of Kansas (institutionCode = "KU") has a herpetological collection (collectionCode = "KUH") as a single data set, the name of which is spelled out in datasetName = "University of Kansas Biodiversity Institute Herpetology Collection".

institutionID, collectionID, datasetID

The corresponding identifier fields institutionID, collectionID, and datasetID are meant to contain globally unique and persistent identifiers for the three corresponding concepts. The first two of these terms, institutionID and collectionID would best be populated with references to entries in a registry of institutions and collections, such as the Global Registry of Biodiversity Repositories (http://grbio.org), for example, institutionCode = "NHMO", institutionID = "http://grbio.org/cool/2knt-7f1r", collectionCode = "BI", collectionID = "http://grbio.org/cool/wes0-t2ie".

The datasetID is best populated with an identifier for a published data set in which the record can be found. As such, a publication reference such as a Digital Object Identifier (DOI) is a good candidate, for example datasetID = "https://doi.org/10.15468/aomfnb" for records in the 2015 eBird Observation Dataset (see http://www.gbif.org/dataset/4fa7b334-ce0d-4e88-aaae-2e0c138d049e).