Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support no-prefix metadata records (one record per directory) #50

Closed
mih opened this issue Jul 10, 2023 · 4 comments · Fixed by #82
Closed

Support no-prefix metadata records (one record per directory) #50

mih opened this issue Jul 10, 2023 · 4 comments · Fixed by #82

Comments

@mih
Copy link
Contributor

mih commented Jul 10, 2023

#48 made me think about multi-record organizations. I think it should be useful to support schemes like

<root> / <dataset-id> / <dataset-version> / <tabby-collection>

This would make a common prefix unnecessary. A record would have files like

  • dataset.tsv
  • authors.tsv
  • authors.override.json

and all files of a record (and only those) would be contained in a (versioned) directory.

@christian-monch
Copy link
Contributor

I think that is a useful structure

@christian-monch
Copy link
Contributor

Does this imply to serialize a dataset tree (git repos with submodules) into a collection of directories, which are structured as described above?

IIUC that would require the possibility for a sub-module entry in a tabby-record. Or did I misunderstand the intention here?

@mih
Copy link
Contributor Author

mih commented Jul 10, 2023

Yes, your conclusion is correct. tabby must be able to express something like hasPart linking a version-level dataset metadata record.

@mih
Copy link
Contributor Author

mih commented Jul 17, 2023

I implemented a no-prefix tabby file collection, ie. one where there would be a plain dataset.tsv file.

I found this to be cumbersome and full of corner cases. For example, conversion to XLSX would required to invent a file name (because the component names would vanish, and without a prefix there would be nothing left, apart from .xlsx).

I conclude that making this "simplification" actually leads to a complication of code, and worse 3rd-party handling code.

This realization does not impact the general organization of

<root> / <dataset-id> / <dataset-version> / <tabby-collection>

except that any file in <tabby-collection> does carry a name prefix with a _ delimiter.

@mih mih closed this as completed Jul 17, 2023
mih added a commit that referenced this issue Jul 19, 2023
In preparation for
#79 and
despite the conclusion in
#50 this
change adds support for a simplified set of files that form a tabby
record.

The only thing that is simplified is that the common prefix is removed
from all filename. The demo record is not also included in this format.

This layout is what we would like put into a ZIP file container.

The prefix continues to exist (this was the main concern in #50), but is
now the name of the parent directory.

In #55
this simplifies the setup for the self-description of a dataset. All
files could go into `.datalad/tabby/self/` and have short names like:

- `dataset.tsv`
- `dataset.override.json`
- ...

There is no particular additional markup necessary to distinguish
single-item-dir format from the prefixed-layout. The absence of an
underscore char, is evidence enough.

Closes #50 (for real)
mih added a commit that referenced this issue Jul 19, 2023
In preparation for
#79 and
despite the conclusion in
#50 this
change adds support for a simplified set of files that form a tabby
record.

The only thing that is simplified is that the common prefix is removed
from all filename. The demo record is not also included in this format.

This layout is what we would like put into a ZIP file container.

The prefix continues to exist (this was the main concern in #50), but is
now the name of the parent directory.

In #55
this simplifies the setup for the self-description of a dataset. All
files could go into `.datalad/tabby/self/` and have short names like:

- `dataset.tsv`
- `dataset.override.json`
- ...

There is no particular additional markup necessary to distinguish
single-item-dir format from the prefixed-layout. The absence of an
underscore char, is evidence enough.

Closes #50 (for real)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants