Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding Problem While Importing from a Russian Thunderbird CSV file to Addressbook #4694

Closed
rcubetrac opened this issue Nov 6, 2014 · 9 comments

Comments

@rcubetrac
Copy link

Reported by alexey dv on 6 Nov 2014 11:20 UTC as Trac ticket #1490135

If I understand correctly, cvs-file has invalid header.

Migrated-From: http://trac.roundcube.net/ticket/1490135

@rcubetrac
Copy link
Author

Comment by alexey dv on 6 Nov 2014 12:06 UTC

==== Bug
Calling rcube_charset::detect in function import file program/lib/Roundcube/rcube_csv2vcard.php returns the wrong encoding (for thunder.csv) ISO-8859-1 instead of Windows-1251 .

@rcubetrac
Copy link
Author

Comment by @alecpl on 6 Nov 2014 14:28 UTC

What is your language in Roundcube UI? Is charset correctly detected if you switch to ru_RU?

@rcubetrac
Copy link
Author

Comment by alexey dv on 6 Nov 2014 15:47 UTC

My language is RU_ru.

Function $rcube->get_user_language(); returns RU_ru.

$prio = Array ( [=> UTF-8 [1](0]) => WINDOWS-1251 [2] => KOI8-R )

mb_detect_encoding($string, $encodings) does not detect encoding cp1251 in the $string.

@rcubetrac
Copy link
Author

Comment by @alecpl on 7 Nov 2014 09:27 UTC

It looks like mb_detect_encoding() is not very reliable. https://bugs.php.net/bug.php?id=38138.

@rcubetrac
Copy link
Author

Comment by @alecpl on 7 Nov 2014 10:44 UTC

Fixed in a7a778c. Still not perfect, however.

@rcubetrac
Copy link
Author

Status changed by @alecpl on 7 Nov 2014 10:44 UTC

new => closed

@rcubetrac
Copy link
Author

Milestone changed by @alecpl on 7 Nov 2014 10:44 UTC

later => 1.1-beta

@rcubetrac
Copy link
Author

Comment by alexey dv on 7 Nov 2014 15:43 UTC

Thanks.

@rcubetrac
Copy link
Author

Comment by alexey dv on 13 Nov 2014 19:11 UTC

Perhaps to select single-byte Cyrillic use the detect_cyr_charset (http://www.opennet.ru/base/dev/charset_autodetect.txt.html) or any similar?

And use this option in the variable $prio:

case 'ru_RU': 
    $prio = array('UTF-8', static::detect_cyr_charset($string, 'WINDOWS-1251')); 
    break;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant