Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace usage of $ref with tag #268

Open
eslavich opened this issue Jun 29, 2020 · 3 comments
Open

Replace usage of $ref with tag #268

eslavich opened this issue Jun 29, 2020 · 3 comments

Comments

@eslavich
Copy link
Contributor

Typically when a core schema references another, we tack on the entire referenced schema with $ref. Here's an example from asdf-1.1.0:

type: object
properties:
  asdf_library:
    description: |
      Describes the ASDF library that produced the file.
    $ref: "software-1.0.0"

Here's a loose outline of how the library handles this schema:

  • Traverse ASDF tree for validation
  • Recognize object tagged tag:stsci.edu:asdf/core/asdf-1.1.0
  • Validate against schema for asdf-1.1.0, which includes the entirety of the software-1.0.0 schema. Due to the $ref, the nested software-1.0.0 object is validated here.
  • Continue traversing ASDF tree
  • Recognize object tagged stsci.edu:asdf/core/software-1.0.0
  • Validate against schema for software-1.0.0

Notice how the software-1.0.0 object is being validated twice? Due to an extension we made to JSON schema in our yaml-schema/draft-01 metaschema, we have another way to express this relationship:

type: object
properties:
  asdf_library:
    description: |
      Describes the ASDF library that produced the file.
    tag: tag:stsci.edu:asdf/core/software-1.0.0

This schema validates the tag of the object assigned to asdf_library but not the object's content. The content of the software-1.0.0 object will still be validated when the library recognizes its tag.

Should we switch the core schemas to the second form? Since validation is the slowest part of reading a file, it may improve performance significantly to avoid validating each object twice (or more, if references are nested more than 1 deep.

@eslavich eslavich changed the title Replace usage of $ref with tag Replace usage of $ref with tag Jun 29, 2020
@jdavies-st
Copy link
Contributor

😮 🎉

Worth trying in a PR.

The 2nd form also looks like it does not depend on the referenced tag schema to be co-located with the first. Can one use an id instead of a tag? Are they interchangeable?

type: object
properties:
  asdf_library:
    description: |
      Describes the ASDF library that produced the file.
    tag: http://stsci.edu/schemas/asdf/core/software-1.0.0

Is this something that could increase performance for asdf-standard 1.5 type schemas as well? I.e. would it be worthwhile doing a 1.6?

@eslavich
Copy link
Contributor Author

The 2nd form also looks like it does not depend on the referenced tag schema to be co-located with the first.

The $ref property is evaluated against the URI scope of the enclosing schema, so you can use the absolute id of the target schema or a path relative to the current scope. tag is always absolute (just because that's the way we implemented it).

Can one use an id instead of a tag? Are they interchangeable?

I don't think they're interchangeable -- the tag property validates that the object has exactly the specified YAML tag.

@eslavich
Copy link
Contributor Author

Is this something that could increase performance for asdf-standard 1.5 type schemas as well? I.e. would it be worthwhile doing a 1.6?

I think this would increase performance of the 1.x schemas. It may or may not be worthwhile to release another 1.x ASDF Standard just for that improvement. Now that we have the option to disable validate_on_read in the library it's probably less important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants