Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TNEF decoder works wrong with attachments' long filenames #5646

Closed
igor-krein opened this issue Feb 15, 2017 · 3 comments
Closed

TNEF decoder works wrong with attachments' long filenames #5646

igor-krein opened this issue Feb 15, 2017 · 3 comments
Assignees
Milestone

Comments

@igor-krein
Copy link

I've commented on the similar issue, but it is closed already and I am not sure somebody would see it, so I better open a new one. I am copy-pasting my comment here:

Well, this solution [currently, it is a line 257 in roundcubemail/program/lib/Roundcube/rcube_tnef_decoder.php which looks like $value = str_replace("\0", '', $value);] is a bad solution, because it doesn't work in some cases. It works well with "English" filenames, so all you English-speaking guys usually don't notice anything wrong, but when an attachment has a Russian or Arabic or some Japanese name, I am pretty sure it would look broken. At least, this was the case with Hebrew attachments I dealt with recently, and I don't particularly see there would be different behavior with other non-Latin cases.

What you do here is just strip zero bytes (which are the "excess" bytes in case of Latin alphabet) and pretend that now you got the UTF-8 encoded string, which is simply not true (more than that, in rcube_message.php there is a fix_attachment_name function which actually tries to undo what is being broken here!). As far as I understand Transport Neutral Encapsulation Format (TNEF) Data Algorithm, long filenames are UTF-16LE encoded. It means that instead of zero-bytes that are used in Latin, there would be 0x05-bytes in Hebrew, for example. And stripping zeros from Hebrew string means that you strip nothing but the null terminator. When you try to read it as a UTF-8 string, all you got is gibberish.

I think, what really needs to be done here is simple string converting from UTF-16LE to UTF-8, but not with the help of Roundcube utf16_to_utf8 function(!), since in UTF-16LE two-byte pairs are in reverse order. Standard iconv should work.

You may also pre-check if the string is UTF-16 encoded (in TNEF such strings have two-byte null terminators, while UTF-8 strings get only one zero byte in the end).

@alecpl
Copy link
Member

alecpl commented Feb 16, 2017

I don't think it's as simple as that. I have a winmail.dat sample that encodes attachment name using different charset than UTF-16LE. It looks that there might be some attributes that could help with detecting the charset, but even TNEF specification is not clear about this.

@igor-krein
Copy link
Author

igor-krein commented Feb 16, 2017

Yes, specification is not clear about charset of long filenames attachments, but in other places UTF-16LE is mentioned as somewhat default encoding. Also, as I said, they are adding 2-byte null terminator in UTF-16 and 1-byte in UTF-8, so it is pretty easy to distinguish one from another. But there are no order bytes, alas.

I understand that it would be much better to know for sure, but... Anyway, the way it works now is not appropriate.

BTW, do you happen to know what charset is used in your sample?

@alecpl
Copy link
Member

alecpl commented Jun 27, 2017

Fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants