Adds `research-dataset-schema.yaml` and testing data #12

jsheunis · 2023-12-19T22:49:16Z

This adds a new schema that aims to model a generic research dataset.

This schema draft models a generic research dataset for
the purpose of making it findable and understandable for
humans, for example via an online data catalog.

A (supposedly) valid json data document is added, as well as an invalid document, for testing purposes. The makefile is updated to add testing and document / output generation for the new schema.

TODO 1: solve errors from make validate-examples-research-dataset-schema:

/Library/Developer/CommandLineTools/usr/bin/make validate-valid-examples-research-dataset-schema validate-invalid-examples-research-dataset-schema
linkml-validate -s src/linkml/research-dataset-schema.yaml src/examples/research-dataset-schema/*
INFO:root:Importing datasets as datasets from source /Users/jsheunis/Documents/psyinf/datalad-concepts/src/linkml/research-dataset-schema.yaml; base_dir=/Users/jsheunis/Documents/psyinf/datalad-concepts/src/linkml
INFO:root:Importing linkml:types as /Users/jsheunis/opt/miniconda3/envs/linkml/lib/python3.11/site-packages/linkml_runtime/linkml_model/model/schema/types from source /Users/jsheunis/Documents/psyinf/datalad-concepts/src/linkml/research-dataset-schema.yaml; base_dir=None
INFO:root:Importing types as types from source /Users/jsheunis/Documents/psyinf/datalad-concepts/src/linkml/research-dataset-schema.yaml; base_dir=/Users/jsheunis/Documents/psyinf/datalad-concepts/src/linkml
INFO:root:Using SchemaView with im=None
INFO:root:Importing datasets as datasets from source /Users/jsheunis/Documents/psyinf/datalad-concepts/src/linkml/research-dataset-schema.yaml; base_dir=/Users/jsheunis/Documents/psyinf/datalad-concepts/src/linkml
INFO:root:Importing linkml:types as /Users/jsheunis/opt/miniconda3/envs/linkml/lib/python3.11/site-packages/linkml_runtime/linkml_model/model/schema/types from source /Users/jsheunis/Documents/psyinf/datalad-concepts/src/linkml/research-dataset-schema.yaml; base_dir=None
INFO:root:Importing types as types from source /Users/jsheunis/Documents/psyinf/datalad-concepts/src/linkml/research-dataset-schema.yaml; base_dir=/Users/jsheunis/Documents/psyinf/datalad-concepts/src/linkml
[ERROR] [src/examples/research-dataset-schema/ResearchDataset-penguins.json/0] Additional properties are not allowed ('date_modified', 'version' were unexpected) in /
[ERROR] [src/examples/research-dataset-schema/ResearchDataset-penguins.json/0] {'checksum_md5': 'e7e2be6b203a221949f05e02fcefd853', 'content_url': 'https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.219.3&entityid=002f3893385f710df69eeebe893144ff', 'path_posix': 'raw/adelie.csv', 'size_in_bytes': 23755} is not of type 'string' in /has_part/0
[ERROR] [src/examples/research-dataset-schema/ResearchDataset-penguins.json/0] {'checksum_md5': '1549566fb97afa879dc9446edcf2015f', 'content_url': 'https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.220.3&entityid=e03b43c924f226486f2f0ab6709d2381', 'path_posix': 'raw/gentoo.csv', 'size_in_bytes': 11263} is not of type 'string' in /has_part/1
[ERROR] [src/examples/research-dataset-schema/ResearchDataset-penguins.json/0] {'checksum_md5': 'e4b0710c69297031d63866ce8b888f25', 'content_url': 'https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.221.2&entityid=fe853aa8f7a59aa84cdd3197619ef462', 'path_posix': 'raw/chinstrap.csv', 'size_in_bytes': 18872} is not of type 'string' in /has_part/2
[ERROR] [src/examples/research-dataset-schema/ResearchDataset-penguins.json/0] {'email': '[email protected]', 'name': 'Allison Horst'} is not of type 'string' in /data_controller
make[1]: *** [validate-valid-examples-research-dataset-schema] Error 1
make: *** [validate-examples-research-dataset-schema] Error 2

TODO 2: address warnings in make check-research-dataset-schema:

[Check src/linkml/research-dataset-schema.yaml]
Run linter
INFO:root:Importing linkml:units as /Users/jsheunis/opt/miniconda3/envs/linkml/lib/python3.11/site-packages/linkml_runtime/linkml_model/model/schema/units from source /Users/jsheunis/opt/miniconda3/envs/linkml/lib/python3.11/site-packages/linkml_runtime/linkml_model/model/schema/meta.yaml; base_dir=None
INFO:root:Importing linkml:mappings as /Users/jsheunis/opt/miniconda3/envs/linkml/lib/python3.11/site-packages/linkml_runtime/linkml_model/model/schema/mappings from source /Users/jsheunis/opt/miniconda3/envs/linkml/lib/python3.11/site-packages/linkml_runtime/linkml_model/model/schema/meta.yaml; base_dir=None
INFO:root:Importing linkml:types as /Users/jsheunis/opt/miniconda3/envs/linkml/lib/python3.11/site-packages/linkml_runtime/linkml_model/model/schema/types from source /Users/jsheunis/opt/miniconda3/envs/linkml/lib/python3.11/site-packages/linkml_runtime/linkml_model/model/schema/meta.yaml; base_dir=None
INFO:root:Importing linkml:annotations as /Users/jsheunis/opt/miniconda3/envs/linkml/lib/python3.11/site-packages/linkml_runtime/linkml_model/model/schema/annotations from source /Users/jsheunis/opt/miniconda3/envs/linkml/lib/python3.11/site-packages/linkml_runtime/linkml_model/model/schema/meta.yaml; base_dir=None
INFO:root:Importing linkml:extensions as /Users/jsheunis/opt/miniconda3/envs/linkml/lib/python3.11/site-packages/linkml_runtime/linkml_model/model/schema/extensions from source /Users/jsheunis/opt/miniconda3/envs/linkml/lib/python3.11/site-packages/linkml_runtime/linkml_model/model/schema/meta.yaml; base_dir=None
INFO:root:Importing datasets as datasets from source /Users/jsheunis/Documents/psyinf/datalad-concepts/src/linkml/research-dataset-schema.yaml; base_dir=/Users/jsheunis/Documents/psyinf/datalad-concepts/src/linkml
INFO:root:Importing linkml:types as /Users/jsheunis/opt/miniconda3/envs/linkml/lib/python3.11/site-packages/linkml_runtime/linkml_model/model/schema/types from source /Users/jsheunis/Documents/psyinf/datalad-concepts/src/linkml/research-dataset-schema.yaml; base_dir=None
INFO:root:Importing types as types from source /Users/jsheunis/Documents/psyinf/datalad-concepts/src/linkml/research-dataset-schema.yaml; base_dir=/Users/jsheunis/Documents/psyinf/datalad-concepts/src/linkml
/Users/jsheunis/Documents/psyinf/datalad-concepts/src/linkml/research-dataset-schema.yaml
  warning  Enum 'authorType' does not have recommended slot 'description'  (recommended)
  warning  Enum has name 'authorType'  (standard_naming)
  warning  Permissible value of Enum 'authorType' has name 'Person'  (standard_naming)
  warning  Permissible value of Enum 'authorType' has name 'Organization'  (standard_naming)
  warning  Schema maps prefix 'afo' to namespace 'http://purl.allotrope.org/ontologies/result#' instead of using prefix 'afr'  (canonical_prefixes)
  warning  Schema maps prefix 'bibo' to namespace 'https://purl.org/ontology/bibo/' instead of namespace 'http://purl.org/ontology/bibo/'  (canonical_prefixes)
  warning  Schema maps prefix 'obo' to namespace 'https://purl.obolibrary.org/obo/' instead of namespace 'http://purl.obolibrary.org/obo/'  (canonical_prefixes)

✖ Found 7 problems in 1 schema
make: *** [check-research-dataset-schema] Error 1

This adds a new schema that aims to model a generic research dataset. A (supposedly) valid json data document is added, as well as an invalid document, for testing purposes. The makefile is updated to add testing and document / output generation for the new schema.

mih · 2023-12-20T08:55:33Z

Thanks! Looks like the linter run produced some good recommendations already!

jsheunis · 2023-12-20T15:16:45Z

Indeed :) Now I have to figure out why the problem of [something that is an object] is not of type 'string' occurs.

mih · 2023-12-20T20:39:35Z

src/linkml/research-dataset-schema.yaml

+    inlined_as_list: true
+    range: string


I believe this does not work, and it is exactly the case that I outlined previously. The instruction says that objects will be inlined. But at the same time the range is set to string, a basic type. One of the two needs to change, I think.

Thanks this seems like it could be the issue. My original expectation was that the range referred to whatever was inside the list (when using inlined_as_list), so that might be the mistake. Will report back soon.

It actually looks like this is not the problem. Rather, it looks like the specification of a union of ranges via any_of, together with possibly faulty data, is what's causing the issue. I have :

slots: has_part: slot_uri: dcterms:hasPart multivalued: true inlined_as_list: true any_of: - range: File - range: ResearchDataset description: >- Linked entities that form part of a dataset such as files or other (sub)datasets

and then data:

"has_part": [ { "checksum_md5": "e7e2be6b203a221949f05e02fcefd853", "content_url": "https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.219.3&entityid=002f3893385f710df69eeebe893144ff", "path_posix": "raw/adelie.csv", "size_in_bytes": 23755 }, { "checksum_md5": "1549566fb97afa879dc9446edcf2015f", "content_url": "https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.220.3&entityid=e03b43c924f226486f2f0ab6709d2381", "path_posix": "raw/gentoo.csv", "size_in_bytes": 11263 }, { "checksum_md5": "e4b0710c69297031d63866ce8b888f25", "content_url": "https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.221.2&entityid=fe853aa8f7a59aa84cdd3197619ef462", "path_posix": "raw/chinstrap.csv", "size_in_bytes": 18872 } ]

When I set this up I was thinking "how will the validator know whether an object in the list is a File or a ResearchDataset, and I'm guessing this is the issue. It's maybe expecting a ID as string? Will investigate this further.

When I change the schema to:

slots: has_part: slot_uri: dcterms:hasPart multivalued: true inlined_as_list: true range: File description: >- Linked entities that form part of a dataset such as files or other (sub)datasets

and keep the same data, validation succeeds.

Yeah, but the difference is File being a class, not a type, which make inlining valid. So you resolved that particular flaw from the other end.

mih · 2023-12-21T08:43:29Z

src/examples/research-dataset-schema/ResearchDataset-penguins.json

+    ],
+    "author": [
+        {
+            "author_type": "Person",


Consider using metatype from https://github.com/psychoinformatics-de/datalad-concepts/blob/main/src/linkml/typing.yaml

…pe mixin and removing default_range from the schema

…validation pass

mih · 2024-03-10T15:16:26Z

With #87 I finally caught up with this PR. The mainline now has a schema that can do all of this, and is not constrained to research applications.

The example matching the penguins data record in here can be found at https://github.com/psychoinformatics-de/datalad-concepts/blob/main/src/examples/dataset-version/DatasetVersionObject-penguins.yaml

Closing... Thanks for setting the mark. It helped a lot.

jsheunis force-pushed the research-dataset branch from 0e507d0 to 87244ec Compare December 19, 2023 22:53

mih reviewed Dec 20, 2023

View reviewed changes

mih mentioned this pull request Dec 20, 2023

Adds the penguin dataset as a crc1451-dataset-compatible json document #1

Closed

mih reviewed Dec 21, 2023

View reviewed changes

mih mentioned this pull request Dec 21, 2023

Concept of a File necessary? #14

Closed

jsheunis added 4 commits December 22, 2023 08:47

adds yaml format of example data

2c10193

adds metatype to data

95d84ec

tweaks towards successful validation, including the use of the metaty…

9021529

…pe mixin and removing default_range from the schema

add missing ranges

b381cb4

jsheunis mentioned this pull request Dec 22, 2023

Extend dataset class with common/standard properties and relationships #4

Closed

jsheunis added 3 commits December 22, 2023 13:27

use only yaml data

5d0e870

remove identifier designation from 'path_posix' slot in order to let …

7df07ec

…validation pass

change author type to lower case after changing enum options

134fb7c

jsheunis mentioned this pull request Feb 22, 2024

Elements of a non-datalad dataset schema #46

Closed

mih closed this Mar 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds `research-dataset-schema.yaml` and testing data #12

Adds `research-dataset-schema.yaml` and testing data #12

jsheunis commented Dec 19, 2023

mih commented Dec 20, 2023

jsheunis commented Dec 20, 2023

mih Dec 20, 2023

jsheunis Dec 20, 2023

jsheunis Dec 20, 2023

jsheunis Dec 20, 2023

mih Dec 21, 2023

mih Dec 21, 2023

mih commented Mar 10, 2024

Adds research-dataset-schema.yaml and testing data #12

Adds research-dataset-schema.yaml and testing data #12

Conversation

jsheunis commented Dec 19, 2023

mih commented Dec 20, 2023

jsheunis commented Dec 20, 2023

mih Dec 20, 2023

Choose a reason for hiding this comment

jsheunis Dec 20, 2023

Choose a reason for hiding this comment

jsheunis Dec 20, 2023

Choose a reason for hiding this comment

jsheunis Dec 20, 2023

Choose a reason for hiding this comment

mih Dec 21, 2023

Choose a reason for hiding this comment

mih Dec 21, 2023

Choose a reason for hiding this comment

mih commented Mar 10, 2024

Adds `research-dataset-schema.yaml` and testing data #12

Adds `research-dataset-schema.yaml` and testing data #12