Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Essential metadata features #53

Closed
mih opened this issue Jul 10, 2023 · 2 comments
Closed

Essential metadata features #53

mih opened this issue Jul 10, 2023 · 2 comments

Comments

@mih
Copy link
Contributor

mih commented Jul 10, 2023

Trying to condense what is really really needed (and only that) for tabby records to be useful.

Sstarting point:

  • each table describes one or more entities (depending on the table layout) of a particular type -- the same time for all entities defined in the same table

We must be able to:

  1. define @type for each entity
  2. define a IRI (@id) per entity
  3. define a vocabulary covering all terms used to declare properties and their values

This is for X possible via:

  1. Declaration in table, or via override of @type with a defined class/term
  2. Declaration of a table property as @type:@id in the JSON-LD context, or via an override (that potentially combines multiple properties into a (more) unique label -- the latter seems to be a more straightforward approach
  3. Declaration in JSON-LD context Howto annotate that a property uses a controlled vocabulary #54

In particular the ability to generate unique, deterministic IRIs for all entities is a necessity (e.g., for cross-linking information). Use cases where this is critical are:

  • Describe one and the same set of entities, manifested as a set of files with different "data models" (e.g. as a file, and also as a dicom image, or as the outcome of an activity). Suitable identifiers would be a file content hash, but also the scoped identifier of a relative file path in a DatasetVersion (in its simplest form something like the concatenation of a version-independent dataset-id, plus a version label/id, plus the relative path.
  • Cross-linking the origin of multiple files as the outcome of the exact same process/activity
  • Cross-linking entity properties have been obtained from different measurement contained in different components of a longitudinal dataset

Things to try:

  • Is it possible to use something like Z03:x0024 as a value in a @type:@id property, plus a @vocab declaration, such that it resolves to a "scoped" identifier? IOW represent x0024 as an identifier that is controlled by the entity Z03 (which may be a project that is the origin of several datasets that all share a common identifier namespace for these entities)?
@christian-monch
Copy link
Contributor

christian-monch commented Jul 11, 2023

  1. define @type for each entity

One base assumption: we are working with JSON-LD application in mind.

If we do not (yet) want to support user-defined types, what about the following rules:

  1. All properties that we define, have a defined type, e.g. name properties of author-entities have the type https://schema.org/name
  2. Other properties are not allowed

If we do want to support user-defined types, what about something like this:

  1. (As above) all properties that we define, have a defined type, e.g. name properties of author-entities have the type https://schema.org/name
  2. A user can provide self-defined properties, but:
    2.1. they must not conflict with defined properties,
    2.2. the user has to provide a @context definition for the entity (or a set of entities), that allows to fully expand the user-defined property names and deduce their types. The expansion will be performed for each entity before the implicit expansion, which is defined by us, is performed.
  3. All other properties have to be fully expanded names, e.g. http://schema.org/name, that allows to deduce their types
  1. define a IRI (@id) per entity

If we do not support user-defined types, the entity IDs could be automatically created by using distinctive elements to define a fixed mapping (the created IRIs would probably not be resolvable, but that is not required by JSON-LD). For example:

  • For author-entities: http://sfb-1451.de/#authors?email=<email>?name=<name>
  • For datasets: http://sfb-1451.de/#<project>/dataset?identifier=<identifier>?version=<version>

Adding IDs for user-defined properties we should provide a way for the user to add IDs. This could be done, for example, by allowing a generic ID column that has entries in the form @tabby-id-<id>.

The @tabby-id-<id> approach could also generally be used to add custom, i.e. user-provided IDs.

@mih
Copy link
Contributor Author

mih commented Jul 21, 2023

The current best-practice list at http://docs.datalad.org/projects/tabby/en/latest/best-practices.html should cover all of this.

@mih mih closed this as completed Jul 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants