Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements on non-atomic value validation #179

Open
nicolasblumenroehr opened this issue Nov 22, 2023 · 4 comments
Open

Improvements on non-atomic value validation #179

nicolasblumenroehr opened this issue Nov 22, 2023 · 4 comments
Labels
enhancement New feature or request More information needed Assigned when not wnough information is there to fix or further discuss an issue. question Further information is requested

Comments

@nicolasblumenroehr
Copy link
Contributor

The record validation for nested structures could be standardized. Currently, it seems like the validation depends on the DTR validation schema. For example, the checksum validation, a valid value entry would be: "{'md5sum': '723140e4864011bdf1fbc66698a0f041'}". However, if there is no validation schema provided in the DTR for a PID-Info Type, there is no validation for nested structures at all. Furthermore, it would be more reasonable to put the PID of the keys within such structures instead of their name, as only the PID is persistent and recognizable by clients. In the checksum example this would then look like this: "{'21.T11148/ef277087753e8ba2e606': '723140e4864011bdf1fbc66698a0f041'}"

@ThomasJejkal
Copy link
Contributor

Actually, I think its not a fault of the Typed PID Maker (or something which can/should be handled there), as it seems to use the provided validation schema of the DTR. If you take a look at checksum it consists of different possible attributes, which are data types by themself. Scrolling to the end, where the validation schema can be found, shows you, that the validation schema contains a set of definitions, containing the PIDs of the sub types (having slashes replaced by '_' for technical reasons), but the expected properties are the plain names of the particular type, e.g., md5sum.

I'm not sure why this is the case, maybe also technical reasons because an object with a key like

{'21.T11148/ef277087753e8ba2e606': '723140e4864011bdf1fbc66698a0f041'}

may cause problems if you want to select the value via JSON-Path (as the dots are typically interpreted as attribute separator). Maybe that's the reasons, maybe its just a logical mistake in the DTR.

However, in this particular case I would argue, that for something general like a checksum including its algorithm, a more simple type like etag might be the better choice in order not to go too deep into fine-grained type definitions.

@nicolasblumenroehr
Copy link
Contributor Author

Right, the TPM uses the validation schema of the DTR entry, but if there is none validation schema provided, which may happen since the schema is not created automatically as it used to be, there is no validation at all. In this case, for multi-level data types you could then provide anything in the record value. Just wanted to point out that this might be an issue, maybe a validation schema should be a pre-requisite for record validation. Regarding the property names of the nested data types, I see this is again a DTR issue, but this causes big trouble for nested records as the PID of the sub data types goes lost and the name is essentially just a human readable representation

@ThomasJejkal
Copy link
Contributor

I've assumed so far, that for all DataTypes validation schemas are created automatically by the DTR. Do you have an example of a DataType not having any validation information?

@Pfeil Pfeil changed the title Record validation Validation of records with nested records (nested profiles) and without provided schemas Aug 30, 2024
@Pfeil Pfeil added enhancement New feature or request question Further information is requested More information needed Assigned when not wnough information is there to fix or further discuss an issue. labels Aug 30, 2024
@Pfeil
Copy link
Member

Pfeil commented Aug 30, 2024

I read multiple ideas out of this:

  1. A provided record to create could, instead of a PID as a value to an attribute, contain the record of another FAIR DO to create(?)
    a. Not sure if this was really the idea or I misunderstood things
  2. For validation, instead of the schema (which uses names) we could do the full validation manually. This would require even more requests, which means we would need to do extreme speedups in validation (see Potential timeout error #136 and Type-Api support and validation speedup #218 )
    a. potentially possible, but requires performance improvements which can be hard to achieve according to Type-Api support and validation speedup #218
    b. potentially incompatible with DTR schemas, if we are not careful
  3. In general, for non-atomic values, use PIDs instead of attribute names
    a. implies incompatibility to DTR schemas, but I like the idea

@Pfeil Pfeil changed the title Validation of records with nested records (nested profiles) and without provided schemas Improvements on non-atomic value validation Aug 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request More information needed Assigned when not wnough information is there to fix or further discuss an issue. question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants