-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
From John Karp's original description of [the issue]: > By default in Perl, a string is a sequence of bytes, values 0-255. > However, if a Unicode character is included that cannot be represented > with a single byte, the string gets 'upgraded' to a non-byte-based > Unicode string allowing ordinals outside that range. When string > operations are done with byte and non-byte Unicode strings, the result > is always non-byte, with the byte string first 'upgraded'. Upgrading > consists of utf8 encoding and setting a utf8 flag on the string. ('utf8' > is a variant of UTF-8 used by Perl) > > The Perl Avro API is accepting these Unicode strings as-is for the > 'bytes' type. This is a problem because > > 1. values >255 are not valid as bytes, and any encoding is their job > > 2. As Avro assembles the serialized data, Perl 'upgrades' all the data, > having the effect of utf8 encoding our serialized binary data. > > The correct behavior is for the Avro Perl API is to attempt to downgrade > the string, and if this fails because it contained values >255 then to > raise an error. (The behavior of 'string' won't change, it will still > take Unicode strings as expected.) This change, based on the one submitted for that ticket, adds these behaviours and tests to exercise them. [the issue]: https://issues.apache.org/jira/browse/AVRO-1517
Showing
4 changed files
with
54 additions
and
14 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters