-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVRO-3666 [Java] Use the new schema parser #2642
Conversation
This undoes the split schema parsing to allow forward references, which is to be handles by the SchemaParser & ParseContext classes. It uses the new ParseContext for the classic schema parser to accommodate this. Next step: use the new SchemaParser and resolve unresolved / forward references after parsing. This will also resolve "forward" references that were parsed in subsequent files.
By resolving references after parsing, we allow both forward references within a file as between subsequent files. This change also includes using the new SchemaParser everywhere, as using it is the best way to flush out bugs.
Also includes changes necessary to debug.
lang/java/avro/src/main/java/org/apache/avro/JsonSchemaParser.java
Dismissed
Show dismissed
Hide dismissed
The wrong exclusion was removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments (not yet enough time to see all)
NameValidator saved = Schema.getNameValidator(); | ||
try { | ||
Schema.setNameValidator(nameValidator); // Ensure we use the same validation. | ||
HashMap<String, Schema> result = new LinkedHashMap<>(oldSchemas); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why LinkedHashMap here; where it needs to keep order ?
Map<String, Schema> result = new HashMap<>(oldSchemas);
is not enough ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of the things I'm not sure about: how to provide the parsed schemas. Ideally, a this would be a Set<Schema>
(meaning a HashMap
would suffice here).
However, current code uses a List<Schema>
, and this is generally the more prevalent collection type.
This ensures the SchemaParser never returns unresolved schemata.
lang/java/compiler/src/test/java/org/apache/avro/compiler/schema/TestSchemas.java
Fixed
Show fixed
Hide fixed
lang/java/avro/src/main/java/org/apache/avro/JsonSchemaParser.java
Dismissed
Show dismissed
Hide dismissed
lang/java/avro/src/main/java/org/apache/avro/JsonSchemaParser.java
Dismissed
Show dismissed
Hide dismissed
lang/java/avro/src/main/java/org/apache/avro/ParseContext.java
Dismissed
Show dismissed
Hide dismissed
lang/java/avro/src/main/java/org/apache/avro/ParseContext.java
Dismissed
Show dismissed
Hide dismissed
lang/java/avro/src/main/java/org/apache/avro/ParseContext.java
Dismissed
Show dismissed
Hide dismissed
lang/java/compiler/src/main/java/org/apache/avro/compiler/idl/SchemaResolver.java
Dismissed
Show dismissed
Hide dismissed
The JSON schema parser is quite complex (it is a large method). This change splits it in multiple methods, naming the various stages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to review this, but don't know where to start. The schema parser is very fundamental and there are 150+ files to review. My biggest concern is about breaking public API's. We had this in the Avro 1.8 to 1.9 version because we had to remove codehaus Jackson from the public API, and it was very hard to get this downstream into other projects.
These are valid concerns, so I'll address them each. Breaking API's:
PR size:
|
This change reduces the PR size, but does require some extra work after merging: the new SchemaParser class is hardly used, and the (now) obsolete Schema.Parser class is used heavily.
lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SchemaTask.java
Fixed
Show fixed
Hide fixed
lang/java/compiler/src/test/java/org/apache/avro/specific/TestSpecificData.java
Fixed
Show fixed
Hide fixed
lang/java/compiler/src/test/java/org/apache/avro/compiler/specific/TestSpecificCompiler.java
Fixed
Show fixed
Hide fixed
lang/java/compiler/src/test/java/org/apache/avro/compiler/specific/TestSpecificCompiler.java
Fixed
Show fixed
Hide fixed
lang/java/maven-plugin/src/test/java/org/apache/avro/mojo/TestInduceMojo.java
Fixed
Show fixed
Hide fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good @opwvhk. Last time while going over it, it looked like a lot of public APIs were changed, it those are unreleased as you already mentioned in a comment. I also ran this branch against the Iceberg test-suite, and no issues there. Thanks for working on this! 🚀
@@ -18,25 +18,36 @@ | |||
package org.apache.avro; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a new file, so it is okay to break the APIs here
@opwvhk Could you resolve the merge conflicts? |
Thanks for working on this @opwvhk 🙌 |
Hi Team, Did 1.12.0 or 1.11.4 got released ? I see CVEs fixes related commons-compress 1.26.0 are already in place. Seems to be release is pending. If not released, what would be the ETA ? cc: @Fokko |
A release is near, see the public mailing list: https://lists.apache.org/thread/qg70g7d9j5odkf8fxnnm342mm2kj997l I would say this month since the release process always takes some time in open source. |
@Fokko Any update on the release ? |
@g1rjeevan I'm working on both 1.11.4 and 1.12.0 releases preparation. I need to merge one pending change. I hope to submit the release to vote next week. |
* AVRO-3666: Redo schema parsing code This undoes the split schema parsing to allow forward references, which is to be handles by the SchemaParser & ParseContext classes. It uses the new ParseContext for the classic schema parser to accommodate this. Next step: use the new SchemaParser and resolve unresolved / forward references after parsing. This will also resolve "forward" references that were parsed in subsequent files. * AVRO-3666: Resolve references after parsing By resolving references after parsing, we allow both forward references within a file as between subsequent files. This change also includes using the new SchemaParser everywhere, as using it is the best way to flush out bugs. * AVRO-3666: Remove wrong test * AVRO-1535: Fix aliases as well * AVRO-3666: Re-enable disabled test Also includes changes necessary to debug. * AVRO-3666: Fix RAT exclusion The wrong exclusion was removed. * AVRO-3666: Remove unused field * AVRO-3666: Introduce SchemaParser.ParseResult This ensures the SchemaParser never returns unresolved schemata. * AVRO-3666: Use SchemaParser for documentation * AVRO-3666: Refactor after review * AVRO-3666: Fix javadoc * AVRO-3666: Fix merge bug * AVRO-3666: Fix CodeQL warnings * AVRO-3666: Increase test coverage * AVRO-3666: Fix tests * AVRO-3666: Refactor schema parsing for readability The JSON schema parser is quite complex (it is a large method). This change splits it in multiple methods, naming the various stages. * AVRO-3666: rename method to avoid confusion * AVRO-3666: Reduce PR size This change reduces the PR size, but does require some extra work after merging: the new SchemaParser class is hardly used, and the (now) obsolete Schema.Parser class is used heavily. * AVRO-3666: Reduce PR size more * AVRO-3666: Reduce PR size again * AVRO-3666: Spotless * Update lang/java/avro/src/main/java/org/apache/avro/Schema.java Co-authored-by: Fokko Driesprong <[email protected]> * AVRO-3666: Spotless --------- Co-authored-by: Fokko Driesprong <[email protected]>
What is the purpose of the change
This change updates schema parsing so forward references are handles in a uniform way, regardless of the parser used (JSON, IDL, ...).
This change is the missing bit for AVRO-3666, re-enabling some disabled tests (and more changes to debug). This also improves the schema resolving visitor copied from the original IDL parser (it didn't handle circular references in logical types), and aliases pointing to the null namespace (this was lost when querying aliases).
Verifying this change
This change updates a lot of tests to use the new schema parser (or the internal JSON parser), but does not alter tests. As such, the change is covered by existing schema parsing tests.
Documentation
yes/ no)docs / JavaDocs / not documented)