Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASCII85. Four zero bytes encoded as !!!!! instead of z #21

Open
distlibs opened this issue Jul 2, 2021 · 3 comments
Open

ASCII85. Four zero bytes encoded as !!!!! instead of z #21

distlibs opened this issue Jul 2, 2021 · 3 comments

Comments

@distlibs
Copy link

distlibs commented Jul 2, 2021

Four zero bytes encoded as !!!!! instead of z.

$ascii85 = new Base85([
    "characters" => Base85::ASCII85,
    "compress.spaces" => false,
    "compress.zeroes" => true
]);
print $ascii85->encode("\0\0\0\0"); // !!!!!

I tested with https://cryptii.com/pipes/ascii85-encoding

ascii85

I tested with Python too. base64.a85encode outputs z for four zero bytes.

@distlibs distlibs changed the title ASCII85. Four zero bytes encoded as !!!!! instead on z ASCII85. Four zero bytes encoded as !!!!! instead of z Jul 2, 2021
@tuupola
Copy link
Owner

tuupola commented Jul 2, 2021

It is intentional, the z compression does not apply to the final block. This is because the input string is padded with 0x00 to be multiple of 4 and we need to be able to distinguish if the final four zero bytes are padding or actual data.

For example if we have data:
0xaabbccddee

The padded four byte blocks it would be:
0xaabbccdd
0xee000000

$ascii85->encode(hex2bin("aabbccddee"));
/* Wk6L2mJ */
bin2hex($ascii85->decode("Wk6L2mJ"));
/* aabbccddee */

If however the data was:
0xaabbccdd00

The padded four byte blocks it would be:
0xaabbccdd
0x00000000

With current behaviour the z compression is not added to the last block:

$ascii85->encode(hex2bin("aabbccdd00"));
/* Wk6L2!! */
print bin2hex($ascii85->decode("Wk6L2!!"));
/* aabbccdd00 */

However if the z compression was also applied to the last block the decoder could not anymore know which zero bytes are padding and which are data. You can test this by commenting out these lines.

$ascii85->encode(hex2bin("aabbccdd00"));
/* Wk6L2z */
print bin2hex($ascii85->decode("Wk6L2z"));
/* aabbccdd00000000 */

You can also see the Cryptii page has the wrong result with aabbccdd00 input.

@distlibs
Copy link
Author

distlibs commented Jul 3, 2021

Where you found this "the z compression is not added to the last block". I want to read.

@tuupola
Copy link
Owner

tuupola commented Jul 3, 2021

It is described at least in Adobe documents Document management — Portable document format — Part 1: PDF 1.7 and PostScript® LANGUAGE REFERENCE third edition. The interesting parts are:

"If the length of the data to be encoded is not a multiple of 4 bytes, the last, partial group of 4 shall be used to produce a last, partial group of 5 output characters. Given n (1, 2, or 3) bytes of binary data, the encoder shall first append 4 - n zero bytes to make a complete group of 4. It shall encode this group in the usual way, but shall not apply the special z case. Finally, it shall write only the first n + 1 characters of the resulting group of 5. These characters shall be immediately followed by the ~> EOD marker."

and

"If the ASCII85Encode filter is closed when the number of characters written to it is not a multiple of 4, it uses the characters of the last, partial 4-tuple to produce a last, partial 5-tuple of output. Given n (1, 2, or 3) bytes of binary data, it first appends 4 − n zero bytes to make a complete 4-tuple. Then, it encodes the 4-tuple in the usual way, but without applying the z special case. Finally, it writes the first n + 1 bytes of the resulting 5-tuple. Those bytes are followed immediately by the ~> EOD marker. This information is sufficient to correctly encode the number of
final bytes and the values of those bytes. "

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants