Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Technical information needed (or useful) in Darwin Core? #170

Open
DimEvil opened this issue Mar 2, 2021 · 10 comments
Open

Technical information needed (or useful) in Darwin Core? #170

DimEvil opened this issue Mar 2, 2021 · 10 comments

Comments

@DimEvil
Copy link

DimEvil commented Mar 2, 2021

We publish our data following a set of well defined workflows, coming from well defined data systems or databases or project/applications. I think this could be valuable information when publishing a dataset. Especially, when it comes to filtering data on a data aggregator website (i.e. GBIF).
For example. We publish several datasets coming from a database we call NBN or a project named MICA, or another database named VIS and plenty of others systems. But at the moment there is no simple way for me te get all the data originating from one of these databases on the record level from an aggregator. It would be very nice if I could query GBIF for all the occurrences coming from the MICA project or all the records originating from the NBN database.
Which DwC term can be used for this information? collectionCode looks promising but I'm not convinced
(collectionCode: The name, acronym, coden, or initialism identifying the collection or data set from which the record was derived.). Or the explanation is not clear...

@pzermoglio
Copy link
Member

For projects, and only if to search on GBIF, you can use the project ID available on IPT metadata tabs. The GBIF portal allows now searching for project IDs - although so far for single ones, not a list.

@ManonGros
Copy link

Note that on GBIF, the collectionCode is now used to link occurrences to GRSciColl entries (the same for institutionCode). This means that if you use a code that already exists in GRSciColl, the occurrences will be displayed on the corresponding page (see this example). More information on how this matching is done here.
But if you would like to have a specific GRSciColl entry linked to some occurrence, this is the way to go.

@qgroom
Copy link
Member

qgroom commented Mar 3, 2021

I wonder if some of the Darwin Core extensions might help

For example the Literature References extension could link an observation to any citable resource

Also Web links (apparently under development)

I will also mention the issue within the Agent Extension Task Group. Perhaps a project can be considered an "agent" and an appropriate action would be used to link the observation.

@DimEvil
Copy link
Author

DimEvil commented Mar 3, 2021

Hi,
We do use the projectID, but this is not forfilling my needs. I think that how (workflow) the data is published or the name of the original database (which can lead to a lot of different datasets) can be seen as a property of the data.
GRIsiCol is more about scientific collections (I'm thinking of specimens and a name of a collection). I'm thinking about the name of a databse for example.
I'm not 100% sure of this would be a big win of information for worldwide users, but it s definitely a win for an institution on itself.

For example: I want all the occurrences coming from the VIS database (We published 4 VIS-datasets):

In GBIF I can do this: https://www.gbif.org/dataset/search?q=VIS and it gives me a list of datasets with VIS in the tittle
I can search for occurrences and look for where 'VIS' is available in the title, mark them and I.

If the Acronym 'VIS' would be a property of the data, I could search occurrences and indicate if it would be 'collectionCode' VIS and immediately see all the occurrence originated from the VIS database.

and thnx for all the answers sofar!

@tucotuco
Copy link
Member

tucotuco commented May 7, 2021

I wonder, @ManonGros, if the networks capabilities of the GBIF registry might be a good solution for @DimEvil?

@DimEvil
Copy link
Author

DimEvil commented May 7, 2021

Hi, I used VIS now as 'virtual' collectionCode and it gives me exactly what I want
https://www.gbif.org/occurrence/search?collection_code=VIS&occurrence_status=present is providing all records from the VIS database over several datasets...

But still I think that technical information about the datasets would be usefull in DwC

@peterdesmet
Copy link
Member

@DimEvil what technical information do you want to express in addition to the source collection/database system?

@tucotuco networks are indeed a good way to collect datasets related to a project/community and have the advantage that one dataset can belong to multiple networks. See also the suggestion to make registration with a network easier: gbif/ipt#986 (comment)

@DimEvil
Copy link
Author

DimEvil commented May 7, 2021

@peterdesmet I'm providing technical information in collectionCode now, which is like not 100% correct, I would rather do this correct. Undoubtedly there is more valuable info possible, but this needs some thinking I supose.

@peterdesmet
Copy link
Member

I think your use of the term is correct: collectionCode (... identifying the collection or data set from which the record was derived) is imo a suitable term to indicate the source database for non-specimen records. Additional technical information regarding provenance or data standardization steps are imo best expressed in the "Method steps" in the metadata, e.g. https://www.gbif.org/dataset/8a5cbaec-2839-4471-9e1d-98df301095dd#methodology

@DimEvil
Copy link
Author

DimEvil commented May 7, 2021

I think collectionCode originally was intended to define the fysical collection a specimen belongs to, not the digital collection. This works when you only difeined a fysical collection or a 'digital' collection. Example for RBINS museum, what if the specimen is in the vertebrate collection, (collectionCode = VERTEBRATEN ) and digitally in the DARWIN database?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants