Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate CST translation to/from Avro, JSON schema, and Proto #351

Open
criccomini opened this issue Aug 4, 2023 · 2 comments
Open

Validate CST translation to/from Avro, JSON schema, and Proto #351

criccomini opened this issue Aug 4, 2023 · 2 comments

Comments

@criccomini
Copy link
Contributor

This is a uber-ticket to track work for CST validation when converting between (both to and from) Recap schemas and other serdes (Avro, JSON schema, Proto).

The idea is to define a test directory like:

  • /tests/converters/0000
    • from.avsc
    • from.proto
    • from.json
    • recap.yaml
  • /tests/converters/0001
    • to.avsc
    • to.proto
    • to.json
    • recap.yaml
  • /tests/converters/0002
    • to.avsc
    • to.proto
    • to.json
    • recap.yaml
    • from.avsc

The test harness would iterate over every test subdir and verify that using a serde's converter and types.py's to_dict/from_dict would result in the expected structures.

For example, in 0000, the test harness would load the from.avsc into a dict, then use AvroConverter.to_recap to convert the Avro schema to a Recap StructType (the Recap AST). It'd then use recap.types.to_dict to convert the Recap AST to the Recap CST (dict). Finally, it'd load the recap.yaml and compare it to the Recap CST--they should be identical. It'd then do the same thing for the .proto and .json files.

In the second directory example it'd do the inverse. In the third directory example (0002) it'd do both to and from checks for the Avro schemas. The reason for splitting to/from is that you might not get the same schema when converting to Recap as when converting from Recap since some conversions are lossy or require coercion.

@mjperrone LMK if this makes sense.

@mjperrone
Copy link
Contributor

I think building out a test suite like that will be very valuable. There are a lot of good ideas expressed in this. It was a little hard for me to parse.

I think you should put more explicit meaning into the names and directories of the tests, and group the tests semantically.

  • "from" vs "to" tests. I think using input and output and expected in the file names inside the directory will make it more clear.
  • I think the directory should identify which conversion direction is happening.
  • .schema.json to differentiate from .json
  • Instead of numbers, tests named based on what they're trying to test. This I'm not so convinced on. It may be worth it to just put a readme.md in each test directory to explain what each test is trying to cover.
    /tests/converters/into-recap/regex-constraints/
        input.avsc
        input.proto
        input.schema.json
        expected.recap.cst.yaml
    /tests/converters/out-of-recap/min-max-constraints-1/
        expected.avsc
        expected.proto
        expected.json
        recap.cst.yaml

The third test semantic I couldn't wrap my head around.

@criccomini
Copy link
Contributor Author

I like all your suggestions. I'll incorporate them.

Regarding the third test semantic, what I was trying to emphasize was that a single test could validate both avsc -> Recap and Recap -> avsc. The first two tests just validate one direction or the other. I don't think this is strictly required, since you could just have two tests, but I thought it might be more convenient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants