-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds research-dataset-schema.yaml
and testing data
#12
Conversation
This adds a new schema that aims to model a generic research dataset. A (supposedly) valid json data document is added, as well as an invalid document, for testing purposes. The makefile is updated to add testing and document / output generation for the new schema.
0e507d0
to
87244ec
Compare
Thanks! Looks like the linter run produced some good recommendations already! |
Indeed :) Now I have to figure out why the problem of |
inlined_as_list: true | ||
range: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this does not work, and it is exactly the case that I outlined previously. The instruction says that objects will be inlined. But at the same time the range is set to string
, a basic type. One of the two needs to change, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this seems like it could be the issue. My original expectation was that the range referred to whatever was inside the list (when using inlined_as_list
), so that might be the mistake. Will report back soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually looks like this is not the problem. Rather, it looks like the specification of a union of ranges via any_of
, together with possibly faulty data, is what's causing the issue. I have :
slots:
has_part:
slot_uri: dcterms:hasPart
multivalued: true
inlined_as_list: true
any_of:
- range: File
- range: ResearchDataset
description: >-
Linked entities that form part of a dataset
such as files or other (sub)datasets
and then data:
"has_part": [
{
"checksum_md5": "e7e2be6b203a221949f05e02fcefd853",
"content_url": "https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.219.3&entityid=002f3893385f710df69eeebe893144ff",
"path_posix": "raw/adelie.csv",
"size_in_bytes": 23755
},
{
"checksum_md5": "1549566fb97afa879dc9446edcf2015f",
"content_url": "https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.220.3&entityid=e03b43c924f226486f2f0ab6709d2381",
"path_posix": "raw/gentoo.csv",
"size_in_bytes": 11263
},
{
"checksum_md5": "e4b0710c69297031d63866ce8b888f25",
"content_url": "https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.221.2&entityid=fe853aa8f7a59aa84cdd3197619ef462",
"path_posix": "raw/chinstrap.csv",
"size_in_bytes": 18872
}
]
When I set this up I was thinking "how will the validator know whether an object in the list is a File
or a ResearchDataset
, and I'm guessing this is the issue. It's maybe expecting a ID as string? Will investigate this further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I change the schema to:
slots:
has_part:
slot_uri: dcterms:hasPart
multivalued: true
inlined_as_list: true
range: File
description: >-
Linked entities that form part of a dataset
such as files or other (sub)datasets
and keep the same data, validation succeeds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but the difference is File
being a class, not a type, which make inlining valid. So you resolved that particular flaw from the other end.
], | ||
"author": [ | ||
{ | ||
"author_type": "Person", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using metatype
from https://github.com/psychoinformatics-de/datalad-concepts/blob/main/src/linkml/typing.yaml
…pe mixin and removing default_range from the schema
With #87 I finally caught up with this PR. The mainline now has a schema that can do all of this, and is not constrained to research applications. The example matching the penguins data record in here can be found at https://github.com/psychoinformatics-de/datalad-concepts/blob/main/src/examples/dataset-version/DatasetVersionObject-penguins.yaml Closing... Thanks for setting the mark. It helped a lot. |
This adds a new schema that aims to model a generic research dataset.
A (supposedly) valid json data document is added, as well as an invalid document, for testing purposes. The makefile is updated to add testing and document / output generation for the new schema.
TODO 1: solve errors from
make validate-examples-research-dataset-schema
:TODO 2: address warnings in
make check-research-dataset-schema
: