Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DotCoop data has mojibake text #98

Open
wu-lee opened this issue May 25, 2022 · 4 comments
Open

DotCoop data has mojibake text #98

wu-lee opened this issue May 25, 2022 · 4 comments

Comments

@wu-lee
Copy link
Contributor

wu-lee commented May 25, 2022

I'm seeing mojibake text like this in incoming CSV data: "Alliance Coopératives Cameroun", which propagates into the map dialogs. This not the only case. Something upstream is mangling the encoding.

@ColmMassey
Copy link
Collaborator

I'm seeing mojibake text like this in incoming CSV data: "Alliance Coopératives Cameroun", which propagates into the map dialogs. This not the only case. Something upstream is mangling the encoding.

WHere is the best place to catch that?

@wu-lee
Copy link
Contributor Author

wu-lee commented May 26, 2022

Ideally they give us non-mangled data, so we don't have to, but as you say this might not happen very quickly. We might demangle it ourselves, otherwise, but it's not as simple as you might think.

Searching, I see there's a mojibake decoder here, which can helpfully identify how to decode the case I spotted.

https://www.linestarve.com/tools/mojibake/?mojibake=Alliance+Coop%C3%83%C2%A9ratives+Cameroun&unescape_html=auto&remove_terminal_escapes=True&fix_encoding=True&restore_byte_a0=True&replace_lossy_sequences=True&decode_inconsistent_utf8=True&fix_c1_controls=True&fix_latin_ligatures=True&fix_character_width=True&uncurl_quotes=True&fix_surrogates=True&remove_control_chars=True&normalization=NFC

However, some experiments show it's not simply a matter of reading the file with the right encoding - there are actually extra bytes injected, so I get mojibake in any case. It got very fiddly and I gave up for the moment.

@ColmDC
Copy link
Contributor

ColmDC commented Mar 24, 2023

Is this being addressed in new system? @wu-lee

@wu-lee
Copy link
Contributor Author

wu-lee commented Mar 29, 2023

Nope.

Currently data is trusted throughout Mykomaps and the sausage factory. Mykomaps should have some protection in any case, but maybe the best place to gatekeep this is when data comes in from 3rd party sources?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants