TNEF decoder works wrong with attachments' long filenames #5646

igor-krein · 2017-02-15T08:07:20Z

I've commented on the similar issue, but it is closed already and I am not sure somebody would see it, so I better open a new one. I am copy-pasting my comment here:

Well, this solution [currently, it is a line 257 in roundcubemail/program/lib/Roundcube/rcube_tnef_decoder.php which looks like $value = str_replace("\0", '', $value);] is a bad solution, because it doesn't work in some cases. It works well with "English" filenames, so all you English-speaking guys usually don't notice anything wrong, but when an attachment has a Russian or Arabic or some Japanese name, I am pretty sure it would look broken. At least, this was the case with Hebrew attachments I dealt with recently, and I don't particularly see there would be different behavior with other non-Latin cases.

What you do here is just strip zero bytes (which are the "excess" bytes in case of Latin alphabet) and pretend that now you got the UTF-8 encoded string, which is simply not true (more than that, in rcube_message.php there is a fix_attachment_name function which actually tries to undo what is being broken here!). As far as I understand Transport Neutral Encapsulation Format (TNEF) Data Algorithm, long filenames are UTF-16LE encoded. It means that instead of zero-bytes that are used in Latin, there would be 0x05-bytes in Hebrew, for example. And stripping zeros from Hebrew string means that you strip nothing but the null terminator. When you try to read it as a UTF-8 string, all you got is gibberish.

I think, what really needs to be done here is simple string converting from UTF-16LE to UTF-8, but not with the help of Roundcube utf16_to_utf8 function(!), since in UTF-16LE two-byte pairs are in reverse order. Standard iconv should work.

You may also pre-check if the string is UTF-16 encoded (in TNEF such strings have two-byte null terminators, while UTF-8 strings get only one zero byte in the end).

alecpl · 2017-02-16T09:19:25Z

I don't think it's as simple as that. I have a winmail.dat sample that encodes attachment name using different charset than UTF-16LE. It looks that there might be some attributes that could help with detecting the charset, but even TNEF specification is not clear about this.

igor-krein · 2017-02-16T16:28:11Z

Yes, specification is not clear about charset of long filenames attachments, but in other places UTF-16LE is mentioned as somewhat default encoding. Also, as I said, they are adding 2-byte null terminator in UTF-16 and 1-byte in UTF-8, so it is pretty easy to distinguish one from another. But there are no order bytes, alas.

I understand that it would be much better to know for sure, but... Anyway, the way it works now is not appropriate.

BTW, do you happen to know what charset is used in your sample?

…5799)

alecpl · 2017-06-27T13:29:13Z

Fixed.

alecpl added bug C: MIME parsing labels Feb 16, 2017

alecpl modified the milestones: 1.2.4, later Feb 16, 2017

alecpl mentioned this issue Jun 16, 2017

Viewing email attachment from TNEF format causes Apache Internal Error 500 #5799

Closed

alecpl modified the milestones: 1.3.1, later Jun 18, 2017

alecpl self-assigned this Jun 19, 2017

alecpl added a commit that referenced this issue Jun 27, 2017

Fix decoding non-ascii attachment names from TNEF attachments (#5646, #…

0bef84c

…5799)

alecpl added a commit that referenced this issue Jun 27, 2017

Fix decoding non-ascii attachment names from TNEF attachments (#5646, #…

b4c877b

…5799)

alecpl closed this as completed Jun 27, 2017

petri mentioned this issue Mar 2, 2018

tnef_attachement.name is not unicode koodaamo/tnefparse#7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TNEF decoder works wrong with attachments' long filenames #5646

TNEF decoder works wrong with attachments' long filenames #5646

igor-krein commented Feb 15, 2017

alecpl commented Feb 16, 2017

igor-krein commented Feb 16, 2017 •

edited

Loading

alecpl commented Jun 27, 2017

TNEF decoder works wrong with attachments' long filenames #5646

TNEF decoder works wrong with attachments' long filenames #5646

Comments

igor-krein commented Feb 15, 2017

alecpl commented Feb 16, 2017

igor-krein commented Feb 16, 2017 • edited Loading

alecpl commented Jun 27, 2017

igor-krein commented Feb 16, 2017 •

edited

Loading