Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow tags to contain URI fragments #188

Open
drdavella opened this issue Jan 7, 2019 · 0 comments
Open

Allow tags to contain URI fragments #188

drdavella opened this issue Jan 7, 2019 · 0 comments

Comments

@drdavella
Copy link
Contributor

After some experimentation with asdf-format/asdf#446 and some discussions with @Cadair about other potential use cases, I think it would be useful if tags were allowed to contain URI fragments with implementation-specific annotations.

Let's start with a concrete example for illustration. Imagine there is a schema definition for a type called Foo. The associated tag label is "tag:example.org:custom/foo-1.0.0". My instances of Foo are serialized using this schema definition.

However, imagine that I also have a type called Bar which is a subclass of Foo. I would like to serialize Bar and validate it using the Foo schema, but I want to be able to round-trip instances of Bar. How can I do this without adding a new schema definition?

The solution I am suggesting here is to allow my implementation to add a URI fragment to the tag label that indicates that the Foo schema should be used for validation purposes, but I have actually serialized a subclass of this type. Such a tag label might look like this:

"tag:example.org:custom/foo-1.0.0#subclass=Bar".

Basically, what I am proposing is that the validation machinery should ignore all URI fragments and only use the tag itself when resolving schemas. However, implementations may take URI fragments into account when processing types. In this case, my implementation would recognize the subclass=Bar portion of the tag as indicating that a different subclass should be used when restoring this type.

It is important to note that the YAML spec has some relevant language about this:

YAML does not mandate any special relationship between different tags that begin with the same substring. Tags ending with URI fragments (containing “#”) are no exception; tags that share the same base URI but differ in their fragment part are considered to be different, independent tags. By convention, fragments are used to identify different “variants” of a tag, while “/” is used to define nested tag “namespace” hierarchies. However, this is merely a convention, and each tag may employ its own rules. For example, Perl tags may use “::” to express namespace hierarchies, Java tags may use “.”, etc.

I take this to mean that by tags that contain URI fragments are valid in YAML, although they do not necessarily indicate any particular relationship between tags. However, it also seems to imply that it would be perfectly reasonable for ASDF to encode such relationships using URI fragments.

Let's return to the example of the subclass Bar. In all likelihood, I want to serialize Bar instead of Foo because Bar contains some properties that are not adequately represented by Foo. When I serialize my Bar instance, I may be storing attributes that aren't described by the Foo schema. If a different implementation reads my file, and if it is free to ignore the "subclass=Bar" annotation, it may miss these additional properties entirely. This could be problematic.

However, I would argue that this situation is already possible in ASDF. If a schema does not explicitly set additionalProperties: False, then any specific implementation is free to add properties that will not be validated by the schema. So I would argue that we're not any worse off by allowing subclasses to be encoded in this way. Naturally, it follows that subclasses can only be created for types with schemas where additionalProperties: True (which is the default).

There are a few other things to consider:

  • This is possible to handle in the Python implementation, but will it make other potential implementations more difficult? As far as I can tell, the C++ prototype should be able to handle this change.
  • Should specific query words such as "subclass" be recognized by the standard? Maybe a few such words can be encoded in the standard, but implementations are free to define others that are not explicitly in the standard.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant