Shoud tos.hyp move to UTF-8? #95

mikrosk · 2019-09-04T06:11:03Z

When one tries to edit and/or fork+PR of a tos.hyp file, the following message can be observed:

We’ve detected the file encoding as windows-1252. When you commit changes we will transcode it to UTF-8.

Github wants to have all web-based edits in UTF-8, that applies also to websites (I had to re-encode whole https://github.com/mikrosk/ct60tos/tree/gh-pages in that way).

If we did that, we would need to re-encode the files back into Atari encoding before compiling the hypertext.

The text was updated successfully, but these errors were encountered:

th-otto · 2019-09-04T06:29:15Z

Theoretically, there are several possibilities:

recode them to utf-8, and tell udo about the new input encoding. That should still produce STG files in atari encoding. However, i fear some tables are not formatted correctly then, since UDO does not work internally with utf-8, as Ulf Dunkel claims.
recode them to utf8, and output also utf8. My compiler can cope with that, and still produce binary files in atari encoding that can be viewed with ST-Guide (unless you use characters that can't be encoded in atarist encoding of course). But UDO might have same problems as above with tables.
use UDO's universal charset feature. There you use some ascii sequences for international characters, similar to TeX.

Actually, i would prefer to leave them in current encoding. It's github's fault if it cannot cope with different encodings, and it is only a matter if you try to edit it through the web interface. You can alway clone the repo and edit it locally instead.

mikrosk · 2019-09-04T06:34:23Z

Actually, i would prefer to leave them in current encoding. It's github's fault if it cannot cope with different encodings, and it is only a matter if you try to edit it through the web interface. You can alway clone the repo and edit it locally instead.

True. However I'm asking on behalf of other people who can read & spot errors but don't feel up to setting up git and branches etc. Github allows you to easily edit a file and create a PR out of it without any hassle. So that's my main motivation here.

th-otto · 2019-09-04T06:39:24Z

PS: the files are encoded in atari characterset, not cp1252.

PPS.: if you want you can try one or the other option with single files. The input encoding can be changed at any time. Just make sure you set it back at the end of the file, or before inclduing any other file.

PPPS.: unconditionally transcoding the files is a no-go. That will break any source where strings are encoded in the local platforms encoding. Beside that, as mentioned above, they are not encoded in cp1252.

mikrosk · 2019-09-04T06:57:00Z

I guess Github's encoding detection doesn't count with atari encodings. ;)

unconditionally transcoding the files is a no-go. That will break any source where strings are encoded in the local platforms encoding. Beside that, as mentioned above, they are not encoded in cp1252.

I guess it's their policy - you want to have a web content hosted by us, then it must be in UTF-8, period. I don't blame them but sure, it doesn't make our atari life easier.

th-otto · 2019-09-04T06:59:13Z

Edit: there would be another option: use UDO macros for all non-ascii characters, like is done

tos.hyp/config.u

Line 114 in 82c4e0e

!ifdest [html,hh]

here. That way, only a single file would have non-ascii characters.

th-otto · 2019-09-04T07:00:51Z

Its not only a matter of Atari. Any windows program that is not using wide unicode character strings is also affected.

th-otto · 2019-09-12T09:08:47Z

I've just changed a bunch of files to use macros for the non-ascii characters. But i encountered another problem: those characters also appear in verbatim environments (listings, examples etc.), and are not replaced there. So after conversion, you carefully have to check the new output, and that is quite a lot of work.

xdelatour mentioned this issue Jul 14, 2023

Fix typos (wrong character encoding) #141

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shoud tos.hyp move to UTF-8? #95

Shoud tos.hyp move to UTF-8? #95

mikrosk commented Sep 4, 2019

th-otto commented Sep 4, 2019 •

edited

Loading

mikrosk commented Sep 4, 2019

th-otto commented Sep 4, 2019

mikrosk commented Sep 4, 2019

th-otto commented Sep 4, 2019

th-otto commented Sep 4, 2019

th-otto commented Sep 12, 2019

Shoud tos.hyp move to UTF-8? #95

Shoud tos.hyp move to UTF-8? #95

Comments

mikrosk commented Sep 4, 2019

th-otto commented Sep 4, 2019 • edited Loading

mikrosk commented Sep 4, 2019

th-otto commented Sep 4, 2019

mikrosk commented Sep 4, 2019

th-otto commented Sep 4, 2019

th-otto commented Sep 4, 2019

th-otto commented Sep 12, 2019

th-otto commented Sep 4, 2019 •

edited

Loading