Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests to validate example files and check that they are in sync #57

Open
duncandewhurst opened this issue Aug 24, 2022 · 18 comments
Open

Comments

@duncandewhurst
Copy link
Collaborator

Since draft 6, JSON schema includes an examples keyword:

The examples keyword is a place to provide an array of examples that validate against the schema. This isn’t used for validation, but may help with explaining the effect and purpose of the schema to a reader.

@duncandewhurst
Copy link
Collaborator Author

duncandewhurst commented Sep 6, 2022

I think that using this feature would help to keep examples in sync with changes to the schema.

Suggested approach:

  • Manually author examples for literal properties
  • Use a pre-commit script to:
    • Generate/update examples for objects and arrays based on the examples for literal properties
    • Generate standalone example files for each publication format based on the examples in the schema
  • Author a new jsoninclude-code directive to render examples as a codeblock (based on jsoninclude-quote)
  • Use sphinx panels to allow readers to switch between the jsonschema table and the jsoninclude-code example for each component on the reference page.

For the example to be useful, it will need to include at least two nodes, so we'll need to think about how to handle that in the pre-commit script.

@lgs85
Copy link
Contributor

lgs85 commented Sep 6, 2022

+1 to the principle/overall approach

An alternative approach could be to author one or two 'complete' example networks, and use a pre-commit to pull properties from these into the reference/guidance where appropriate, rather than from an examples field. This would enable us to maintain some fuller, coherent examples for illustrative purposes (visualisation, file conversion etc) but I think Duncan's approach lends itself better to keeping everything up-to-date in terms of workflow, due to the proximity of the examples to the property descriptions.

@duncandewhurst
Copy link
Collaborator Author

This would enable us to maintain some fuller, coherent examples for illustrative purposes (visualisation, file conversion etc)

That's what I meant by 'Generate standalone example files for each publication format based on the examples in the schema' - so the repository would have a full example file in each publication format, but the 'single source of truth' for the examples would be the example fields in the schema.

Author a new jsoninclude-code directive to render examples as a codeblock (based on jsoninclude-quote)

I realised that jsoninclude already does this.

@duncandewhurst
Copy link
Collaborator Author

duncandewhurst commented Sep 7, 2022

Having looked into this a bit further, the script to generate the examples 'bottom up' starts to get quite complicated because it has to deal with definitions that are reused in different parts of the schema. It's also harder to make the example coherent when only editing literal properties.

@lgs85, I think your proposed approach looks better. I've tried to flesh it out a bit below:

  • Generate a 'blank' json file
  • Fill in the blank json file with a complete, coherent example
  • Manually add examples to guidance documentation using jsoninclude
  • Set up a pre-commit script to:
    • Generate example .geojson and CSV files
    • Populate examples throughout the schema
    • Add/update examples in the reference documentation using jsoninclude and sphinx-panels
  • Set up tests to:
    • strictly validate the example against the schema, i.e. identify any missing or additional fields
    • check that the .geojson and CSV file examples are in sync

I've started to test out how this would work in https://github.com/Open-Telecoms-Data/open-fibre-data-standard/tree/examples, you can preview the changes to the schema reference page here:

Edit: I've renamed the issue since it has become a bit broader than just including examples in the schema.

@duncandewhurst duncandewhurst changed the title Consider including examples in the schema Add example data Sep 7, 2022
@lgs85
Copy link
Contributor

lgs85 commented Sep 7, 2022

Sounds good. It's a fairly big job @duncandewhurst so let's discuss some details and we can both work on this.

@duncandewhurst
Copy link
Collaborator Author

For the alpha, we need to:

  * [ ]  Add/update examples in the reference documentation using `jsoninclude` and sphinx-panels

This could be done manually or by updating the pre-commit script.

@lgs85
Copy link
Contributor

lgs85 commented Sep 14, 2022

I think that once #102 is merged we can move this issue to the beta project and milestone.

@duncandewhurst duncandewhurst modified the milestones: Alpha, Beta Sep 14, 2022
@duncandewhurst duncandewhurst changed the title Add example data Update pre-commit script to populate examples in schema and generate reference documentation Oct 17, 2022
@lgs85 lgs85 removed their assignment Oct 17, 2022
@duncandewhurst
Copy link
Collaborator Author

  • Set up a pre-commit script to:
    • Generate example .geojson and CSV files
    • Populate examples throughout the schema
    • Add/update examples in the reference documentation using jsoninclude and sphinx-panels

Is done in #158

The outstanding task is to set up the tests proposed in #57 (comment)

@duncandewhurst duncandewhurst changed the title Update pre-commit script to populate examples in schema and generate reference documentation Add tests to validate example files and check that they are in sync Oct 27, 2022
@odscjames
Copy link
Collaborator

GeoJSON and CSV - we can't test directly against them, we'll have to convert both to JSON then test the JSON. Is this ok? It runs the risks problems in the conversion process cause issues here but we'll just have to make sure we get the conversion process right :-)

@duncandewhurst
Copy link
Collaborator Author

duncandewhurst commented Dec 15, 2022

To break it down a bit, I think we should test the following. I've indicated the top priorities:

  • examples/json
    • [Priority 1] Strictly validate /network-package.json against the schema, i.e. check that every field in the schema is present in the example file and that there are no additional fields in the example file.
    • With the exception of /network-package-invalid.json, /nodes-endpoint.json and /spans-endpoint.json, validate all other /*.json files against the schema.
  • examples/geojson
    • [Priority 2] Check that /nodes.json and /spans.json are in sync with examples/json/network-package.json
    • Convert /api-response.geojson and /multiple-networks.geojson to JSON and validate the resulting files against the schema.
  • examples/csv
    • [Priority 3] Check that /*.csv files are in sync with examples/json/network-package.json
    • Check that /template/*.csv files are in sync with schema/network-schema.json

@odscjames
Copy link
Collaborator

[Priority 1] Strictly validate /network-package.json against the schema, check that every field in the schema is present in the example file

Interesting - do you know of a place we do this elsewhere? No worries if not, we can look into this.

and that there are no additional fields in the example file.

Waiting for review in #233

And also suggesting we could test that the broken files actually return some errors when tested :-)

@duncandewhurst
Copy link
Collaborator Author

[Priority 1] Strictly validate /network-package.json against the schema, check that every field in the schema is present in the example file

Interesting - do you know of a place we do this elsewhere? No worries if not, we can look into this.

Not as far as I'm aware. Unless there is an option in the jsonschema library, I think it would be a case of creating a copy of the schema and making all fields required - I guess that can be done programmatically as part of the test.

And also suggesting we could test that the broken files actually return some errors when tested :-)

Sure, but I would prioritise everything in #57 (comment) first.

@odscjames
Copy link
Collaborator

odscjames commented Dec 20, 2022

[Priority 2] Check that /nodes.json and /spans.json are in sync with examples/json/network-package.json
[Priority 3] Check that /*.csv files are in sync with examples/json/network-package.json

Both of these happen in pre-commit so actually #226 will accomplish both of these.

@odscjames
Copy link
Collaborator

Check that /template/*.csv files are in sync with schema/network-schema.json

How is this generated currently? If it was in the pre-commit #226 would test this too.

@duncandewhurst
Copy link
Collaborator Author

Yep, that's done in pre-commit, too.

@duncandewhurst
Copy link
Collaborator Author

Not as far as I'm aware. Unless there is an option in the jsonschema library, I think it would be a case of creating a copy of the schema and making all fields required - I guess that can be done programmatically as part of the test.

Sharing a short script that I used to do this for OC4IDS.

import json

from jsonschema import Draft4Validator

def make_strict(schema):
    """
    Makes all properties required and disallows additional properties.
    """

    if schema.get('type') not in ['object', 'array']:
        return schema

    if schema.get('type') == 'array':
        schema['items'] = make_strict(schema['items'])
        return schema

    for k, v in schema.get('properties', {}).items():
        schema['properties'][k] = make_strict(v)

    for k, v in schema.get('definitions', {}).items():
        schema['definitions'][k] = make_strict(v)

    schema['required'] = list(schema.get('properties', {}).keys())

    schema['additionalProperties'] = False
    return schema

with open('schema/project-level/project-schema.json', 'r') as f:
    schema = json.load(f)

schema = make_strict(schema)

with open('docs/examples/example.json', 'r') as f:
    project_package = json.load(f)

validator = Draft4Validator(schema)

for error in sorted(validator.iter_errors(project_package['projects'][0]), key=str):
  print(f"{'/'.join([str(part) for part in error.absolute_path])}: {error.message}")

@odscjames
Copy link
Collaborator

compiletojsonschema library has on option to disallow additional fields everywhere, also used for testing, and "every field required" could be a good new option to add there. https://compiletojsonschema.readthedocs.io/en/latest/cli.html#set-additional-properties-false-everywhere

@duncandewhurst
Copy link
Collaborator Author

Sounds good! I created an issue: OpenDataServices/compile-to-json-schema#34

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants