AVRO-3966: [Java] Fix default value serialisation for fixed and bytes #2823
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What is the purpose of the change
Default value for fixed and bytes schemas is converted to JSON in the wrong (according to docs) way.
Also, it's possible to store any unicode string as default for bytes, like:
The above example would be serialised into Avro format as "?????".
In this PR:
String
tobyte[]
usingISO_8859_1
charset (the one that used duringbyte[]
to JSON conversion and during reading Avro into Java objects);bytesDefault
andfixedDefault
that this conversion is performed;byte[]
to JSON conversion itself now is done according to doc: by translating bytes to u-escaped sequences (e.g.[2F, 00, 11, FF]
into\u002F\u0000\u0011\u00FF
)Verifying this change
This change is already covered by existing tests, such as:
DISCUSSION IS NEEDED
In the test
TestGenericData
I've had to rewrite testtoStringEscapesControlCharsInBytes
. The new behaviour encodes"a\nb"
as"\u0061\u000A\u0062"
, thus breaking some compatibility (kinda, more like breaking readability).It is possible still to not encode ASCII characters (from
U+0020
toU+007E
) and persist full compatibility with previous version, but then the behaviour would differ from one described in the documentation.Documentation