-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concept of a File
necessary?
#14
Comments
https://github.com/psychoinformatics-de/datalad-concepts/blob/main/src/linkml/datalad-datasets.yaml tries to avoid the concept of The closest equivalent is a With such a concept, the majority of metadata is attributed to the file content, and |
Thanks for the pointer. I looked at One thought I had was whether these concepts of |
The location of any of these is temporary. All classes are drafts. If we found a second use case for them now, it makes sense to more the elsewhere. |
There's something that I don't quite grasp how to map onto existing concepts and procedures yet. I also don't know yet exactly what my question is, so I'm putting down a progression of thoughts. Let's look at the concept of a
The important part is that the resulting file list is in a format that will validate against the So the question becomes, is the format that the users provide their file lists in (with help from a machine or not) the exact same as the format that the schema defines? Or is there some translation layer in between? Or can the schema be defined in such a way, using classes that inherit from superclasses, that the translation of a complicated structure to a flat list is implicitly dealt with inside the schema? Using our existing work, can Something else to keep in mind is the high likelihood that automated processes will run on top of the schema to generate e.g. online forms. And a form that asks you to enter a flat list of files is much more desirable than a form that asks you to enter a directory, then several directory items, etc, etc. |
From my POV the needs and solutions you describe are "front-end". To put it bluntly, the input convenience is bought by ignoring the true nature of the underlying concepts. If a tool facilitates the entry of metadata on an unversioned data "archive", in can make shortcuts and it can use a simplified schema (geared towards simplicity and usage such as form generation). But this would be different from a structure and terminology used for a generic data model (which also must be able to capture more complex cases, such as nesting, versioning, redundant availability), yet still yield a sensible, homogeneous representation. In short: yes, translation/mapping needed. This should not be an uncommon need, hence needs to and will be supported well |
psychoinformatics-de/datalad-schema#15 brings another case like this: a model of a Git commit. From the Git data model perspective things are simple. A commit is
A fairly sensible model could be a flat set of properties for each of these aspects. However, those would have quite complex (or narrow) semantics. psychoinformatics-de/datalad-schema#15 uses a PROV inspired approach. Rather than direct properties, it records the provenance of a commit as two activities (the authoring of the new state vs the committing). This yields a more complex data structure, but each element has simpler (more genericly understood) semantics. |
#31 brings some changes in this regard. It follows the model of DCAT that distinguishes abstract/conceptual resources that are realized with concrete distributions. For datalad we can keep that distinction to express how one and the same file can be available from multiple remotes. The DCAT notion is more flexible, it allows for a resource's nature to change considerably (file formats, etc) between distributions. For DataLad we do not need this flexibility, but it does not hurt to have the base model offer this expressiveness. |
It is not necessary, as far as I can see now. Closes #14
This is a common concept, and seems to suggest itself naturally. #58 also includes it.
However, it comes with problems too, in particular in the datalad context.
File
?File
and its content? If not, what about two files with different names and identical content?The text was updated successfully, but these errors were encountered: