Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Term - parentMeasurementID #362

Closed
guillaumebody opened this issue Jul 22, 2021 · 7 comments
Closed

New Term - parentMeasurementID #362

guillaumebody opened this issue Jul 22, 2021 · 7 comments

Comments

@guillaumebody
Copy link

New term : parentMeasurementID

  • Submitter: Guillaume Body, Anne-Sophie Archambeau, Sophie Pamerlon
  • Efficacy Justification (why is this term necessary?): Estimated records are a wide group of data that share similar information, mostly on statistical precision: confidence interval, standard deviation, distribution. These measurements are precision on other measurements (the main estimated value). To correctly describe this relation, the DwC standard needs to nest measurement within other measurements, such as events which can be nested in each other.
  • Demand Justification (name at least two organizations that independently need this term): European Food Safety Authority (enetwild project), French Office of Biodiversity, potentially GEO BON (all essential biodiversity variables are statistically estimated)

Proposed attributes of the new term:

  • Term name (in lowerCamelCase for properties, UpperCamelCase for classes): parentMeasurementID

  • Organized in Class (e.g., Occurrence, Event, Location, Taxon): MeasurementOrFact

  • Definition of the term (normative): An identifier for the broader Measurement that groups this and potentially other Measurements or fact

  • Usage comments (recommendations regarding content, etc., not normative): Use a globally unique identifier for a dwc:MeasurementOrFact or an identifier for a dwc:MeasurementOrFact that is specific to the data set.

  • Examples (not normative): 9c752d22-b09a-11e8-96f8-529269fb1459 ; E1_E1_O1_M1

  • Note: for correct identification of the record, the basisOfRecord should include a new value: "statistical estimation"

@tucotuco
Copy link
Member

tucotuco commented Jul 22, 2021

This looks like a valuable generic way to extend MeasurementOrFacts.

The definition suggests ("group this and potentially other Measurements or fact" that the term might be used in ways than
use case described in the Efficacy Justification (measurements of measurements). Do you envision other uses? And can you give examples?

2021-07-27 I retract the following opinion based on this commentary. - JRW

I am a bit concerned about the note. In implementation in Darwin Core Archives, the basisOfRecord term is only usable in Occurrence Core records, and has a recommended vocabulary. It does not seem as if there is a viable way to use basisOfRecord here, however, "statistical estimation" might be plausible as a part of the vocabulary used in dwc:measurementType. I say, "part of" because it would not be sufficient on its own, it would have to be "statistical estimation of something".

@dr-shorthair
Copy link

Are measurements that share a common parent all siblings?

@guillaumebody
Copy link
Author

This looks like a valuable generic way to extend MeasurementOrFacts.

The definition suggests ("group this and potentially other Measurements or fact" that the term might be used in ways than
use case described in the Efficacy Justification (measurements of measurements). Do you envision other uses? And can you give examples?

I am a bit concerned about the note. In implementation in Darwin Core Archives, the basisOfRecord term is only usable in Occurrence Core records, and has a recommended vocabulary. It does not seem as if there is a viable way to use basisOfRecord here, however, "statistical estimation" might be plausible as a part of the vocabulary used in dwc:measurementType. I say, "part of" because it would not be sufficient on its own, it would have to be "statistical estimation of something".

Are measurements that share a common parent all siblings ?

This terms would allow to record siblings measurement of a parent one.
For instance, one could record in a roe deer density estimation
Event 1: the area of the study
Occurrence 1: the species and the time period, and the basisOfRecord "statistical estimation"
Measurement 1: measurementType = density ; measurementValue : 15 ; measurementUnit : individual per kilometer square
Measurement 1-1 : measurementType = standard deviation ; measurementValue : 3.2 ; measurementUnit: individual per kilometer square
Measurement 1-2 : measurementType = distribution ; measurementValue : gaussian
Measurement 1-3 : measurementType = confidence interval ; measurementValue: 9|21 ; measurementUnit: individual per kilometer square
Measurement 1-3-1 : measurementType = confidence level ; measurementValue : 95 ; measurementUnit : percentage

Measurement 1-1 ; 1-2 ; 1-3 are indeed sibling and describe the parent one, the density estimation per se. If you remove the measurement introduced by this new term, you get the current possibility of the Darwin Core.

The definition is very similar to the definition of parentEventID, and the use is indeed similar, except that it applies to measurement or fact instead of Event.
In this dataset of density estimation, no human, nor machine has directly observed a roe deer. Those observartion would be found in the raw data dataset. Here, the "presence" of roe deer in only due to a statistical software running. It is even clearer if you think about a dataset based on "probability of presence", such as results of habitat suitability statistical procedure. It also allows to differenciate "expert knowledge" of density, which is "human observation" from statistical estimation, without changing the measurement Value: "density".

@tucotuco
Copy link
Member

Thank you for this example @guillaumebody. Now that I see better what you are trying to do I retract my comment. The Occurrence records in the Occurrence extension can each bear a basisOfRecord, so the remaining issue would be to create a new class term proposal for something like StatisticalEstimation to accompany the existing types of Occurrence types (PreservedSpecimen, LivingSpecimen, FossilSpecimen, MachineObservation, HumanObservation, MaterialCitation).

@albenson-usgs
Copy link

The OBIS Secretariat and nodes have reviewed the proposal and while we do not have an immediate use case to apply it to, we can see it being a valuable addition to the MoF extensions. If ratified as a new term, OBIS will ensure it's added to the extended measurement or fact extension.

@pieterprovoost
Copy link

pieterprovoost commented May 3, 2023

Hi all, I would like to bring to your attention gbif/rs.gbif.org#103 which proposes to add dwc:relatedResourceID (or rather dwc:resourceID) to the ExtendedMeasurementOrFact extension. As @albenson-usgs pointed out, adding this term to the MeasurementOrFact extension would probably address the parent measurement issue discussed here as well.

@guillaumebody
Copy link
Author

guillaumebody commented May 3, 2023

Hi all, I would like to bring to your attention gbif/rs.gbif.org#103 which proposes to add dwc:relatedResourceID (or rather dwc:resourceID) to the ExtendedMeasurementOrFact extension. As @albenson-usgs pointed out, adding this term to the MeasurementOrFact extension would probably address the parent measurement issue discussed here as well.

Hi Pieter,
This term would indeed technicaly do the job. In my view, yet, there is a clear difference between "relatedID", and "parentID".

The parentID (either Event, Occurence, Measurement, ...) is a clear indication of nested records, a "within" term. Through relatedResourceID, you can link very different information that share very different relationships. Merging both will univetably end up with confusion.

For instance, you could have estimations of population density throught 2 methods: method 1 giving 10 95IC 8-12 and method 2 giving 12 95IC 9-15.

MeasurementID parentMeasurementID relatedResourceID measurementType measurementValue measurementUnit
uuid_1 uuid_2 density 10 individual per kilometer square
uuid_11 uuid_1 x_0.025 8 individual per kilometer square
uuid_12 uuid_1 x_0.975 12 individual per kilometer square
uuid_2 uuid_1 density 12 individual per kilometer square
uuid_21 uuid_2 x_0.025 9 individual per kilometer square
uuid_22 uuid_2 x_0.975 15 individual per kilometer square

if needed, you can add a crossed relatedResourceID between uuid_1 and uuid_2 to indicate that they are the estimation of the same element, or a relatedResourceID to the graphique of probability density of each estimation without mixing it with the structuration of your data. Of course, a generic parentResourceID would work well in addition to a generic relatedRessourceID.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants