Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO-3825: [Java] Disallow invalid namespaces #2430

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

sarutak
Copy link
Member

@sarutak sarutak commented Aug 9, 2023

What is the purpose of the change

According to the specification, each portion of a namespace separated by dot should be [a-zA-Z_][a-zA-Z0-9_].
https://avro.apache.org/docs/1.11.1/specification/#names

The name portion of the fullname of named types, record field names, and enum symbols must:

    start with [A-Za-z_]
    subsequently contain only [A-Za-z0-9_]

A namespace is a dot-separated sequence of such names. The empty string may also be used as a namespace to indicate the null namespace. Equality of names (including field names and enum symbols) as well as fullnames is case-sensitive.

The null namespace may not be used in a dot-separated sequence of names. So the grammar for a namespace is:

  <empty> | <name>[(<dot><name>)*]

But the current Java binding accept invalid namespaces.
This PR aims to fix this issue.

Verifying this change

Added new test and it passed on my laptop.

Documentation

No new features added.

@github-actions github-actions bot added the Java Pull Requests for Java binding label Aug 9, 2023
@sarutak
Copy link
Member Author

sarutak commented Aug 9, 2023

Hmm, the Java binding seems to generate namespaces which contain $ for inner classes/enums.

[INFO] Running org.apache.avro.protobuf.TestProtobuf
Error:  Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.372 s <<< FAILURE! - in org.apache.avro.protobuf.TestProtobuf
Error:  org.apache.avro.protobuf.TestProtobuf.nestedEnum  Time elapsed: 0.018 s  <<< ERROR!
java.lang.RuntimeException: org.apache.avro.SchemaParseException: Illegal character in: Test$M
	at org.apache.avro.protobuf.ProtobufData.getSchema(ProtobufData.java:187)
	at org.apache.avro.protobuf.TestProtobuf.nestedEnum(TestProtobuf.java:115)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at java.util.ArrayList.forEach(ArrayList.java:1259)
	at java.util.ArrayList.forEach(ArrayList.java:1259)
Caused by: org.apache.avro.SchemaParseException: Illegal character in: Test$M
	at org.apache.avro.Schema.validateName(Schema.java:1641)
	at org.apache.avro.Schema.validateSpace(Schema.java:1652)
	at org.apache.avro.Schema.access$700(Schema.java:96)
	at org.apache.avro.Schema$Name.<init>(Schema.java:720)
	at org.apache.avro.Schema.createEnum(Schema.java:233)
	at org.apache.avro.protobuf.ProtobufData.getSchema(ProtobufData.java:322)
	at org.apache.avro.protobuf.ProtobufData.getSchema(ProtobufData.java:183)

@sarutak
Copy link
Member Author

sarutak commented Aug 9, 2023

@KalleOlaviNiemitalo
Copy link
Contributor

This use of the dollar sign is apparently defined in Java language specification §13.1 (The Form of a Binary), so all Java compilers should generate the same binary name Test$M. However, for local or anonymous classes, the JLS requires "a non-empty sequence of digits", and I think different compilers could choose different digits. Application developers then should not use such classes for Avro data, as it would be difficult from them to ensure that the name stays the same.

I'd prefer keeping dollar signs disallowed in the Avro specification. Schemas that use dollar signs won't be fully portable to other languages, even if the Java implementation of Avro allows them. It might be useful to have a strict mode that flags them in the Java implementation too, but I don't know how that should be configured.

@sarutak
Copy link
Member Author

sarutak commented Aug 9, 2023

This use of the dollar sign is apparently defined in Java language specification §13.1 (The Form of a Binary), so all Java compilers should generate the same binary name Test$M. However, for local or anonymous classes, the JLS requires "a non-empty sequence of digits", and I think different compilers could choose different digits. Application developers then should not use such classes for Avro data, as it would be difficult from them to ensure that the name stays the same.

I know that $ is automatically generated for inner classes and it's ideal that users don't use such classes. But protobuf generates such classes and the Java binding supports a feature that generates a schema from protobuf-generated classes.

@opwvhk
Copy link
Contributor

opwvhk commented Aug 17, 2023

On a side note, this change may lead to more previously accepted schemata to fail parsing.

PR #2448 includes an addition to the specification that describes how to fix this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Foo.java, M.java and Test.java are generated automatically from test.proto.

/**
* <code>required int32 a = 1;</code>
*/
public int getA() {

Check notice

Code scanning / CodeQL

Missing Override annotation

This method overrides [BarInnerOrBuilder.getA](1); it is advisable to add an Override annotation.
/**
* <code>required int32 a = 1;</code>
*/
public boolean hasA() {

Check notice

Code scanning / CodeQL

Missing Override annotation

This method overrides [BarInnerOrBuilder.hasA](1); it is advisable to add an Override annotation.
/**
* <code>required int32 a = 1;</code>
*/
public int getA() {

Check notice

Code scanning / CodeQL

Missing Override annotation

This method overrides [BarInnerOrBuilder.getA](1); it is advisable to add an Override annotation.
/**
* <code>required int32 a = 1;</code>
*/
public boolean hasA() {

Check notice

Code scanning / CodeQL

Missing Override annotation

This method overrides [BarInnerOrBuilder.hasA](1); it is advisable to add an Override annotation.
}

@java.lang.Override
public Builder mergeFrom(com.google.protobuf.Message other) {

Check notice

Code scanning / CodeQL

Confusing overloading of methods

Method Builder.mergeFrom(..) could be confused with overloaded method [Builder.mergeFrom](1), since dispatch depends on static types. Method Builder.mergeFrom(..) could be confused with overloaded method [mergeFrom](2), since dispatch depends on static types.
}

@java.lang.Override
public Builder mergeFrom(com.google.protobuf.Message other) {

Check notice

Code scanning / CodeQL

Confusing overloading of methods

Method Builder.mergeFrom(..) could be confused with overloaded method [Builder.mergeFrom](1), since dispatch depends on static types. Method Builder.mergeFrom(..) could be confused with overloaded method [mergeFrom](2), since dispatch depends on static types.
if (extensionRegistry == null) {
throw new java.lang.NullPointerException();
}
int mutable_bitField0_ = 0;

Check notice

Code scanning / CodeQL

Unread local variable

Variable 'int mutable_bitField0_' is never read.
@sarutak sarutak force-pushed the disallow-invalid-namespaces branch from e8d31ad to 8238b35 Compare August 17, 2023 21:02
@sarutak
Copy link
Member Author

sarutak commented Aug 19, 2023

cc: @RyanSkraba

@locomotif
Copy link

Hi Folks - Glad we have a PR to address this issue, it surfaced recently, and was wondering if there are plans to land this fix sometime in the near future. Thanks! 🙂

@sarutak
Copy link
Member Author

sarutak commented Oct 9, 2023

I'll update this PR after this PR is merged because this PR modifies a .proto file and need to regenerate .class files for test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Java Pull Requests for Java binding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants