Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PAD-FIXED-LENGTH-STRING is wrong for non-8-bit string encodings #2

Open
heegaiximephoomeeghahyaiseekh opened this issue Dec 1, 2016 · 0 comments

Comments

@heegaiximephoomeeghahyaiseekh
Copy link
Collaborator

heegaiximephoomeeghahyaiseekh commented Dec 1, 2016

lisp-binary::pad-fixed-length-string adds its padding before encoding. This works fine for encodings such as ASCII and Latin-1, but will fail for encodings such as UTF-8, in which the byte length may differ from the character length.

Padding the string after encoding might not work either, since the padding bytes would have to respect, (and therefore the writer generator must be aware of), the rules of each encoding.

One way to fix it could be to use a binary-search algorithm to find the right number of padding characters to add to the pre-encoded string to get the desired post-encoding length. In the worst case scenario, this could require re-encoding the same string dozens of times.

heegaiximephoomeeghahyaiseekh pushed a commit that referenced this issue Dec 1, 2016
MAKE-FIXED-LENGTH-STRING, which encodes as well as pads the
string. It may still have problems in encodings with variable-length
characters, in which it's possible to choose a string and
padding character which can't add up to the required length.

MAKE-FIXED-LENGTH-STRING also provides the option to truncate
overlong input strings, but this functionality might also fail
in variable-length character encodings.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant