Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shoud tos.hyp move to UTF-8? #95

Open
mikrosk opened this issue Sep 4, 2019 · 7 comments
Open

Shoud tos.hyp move to UTF-8? #95

mikrosk opened this issue Sep 4, 2019 · 7 comments

Comments

@mikrosk
Copy link
Member

mikrosk commented Sep 4, 2019

When one tries to edit and/or fork+PR of a tos.hyp file, the following message can be observed:

We’ve detected the file encoding as windows-1252. When you commit changes we will transcode it to UTF-8.

Github wants to have all web-based edits in UTF-8, that applies also to websites (I had to re-encode whole https://github.com/mikrosk/ct60tos/tree/gh-pages in that way).

If we did that, we would need to re-encode the files back into Atari encoding before compiling the hypertext.

@th-otto
Copy link
Contributor

th-otto commented Sep 4, 2019

Theoretically, there are several possibilities:

  • recode them to utf-8, and tell udo about the new input encoding. That should still produce STG files in atari encoding. However, i fear some tables are not formatted correctly then, since UDO does not work internally with utf-8, as Ulf Dunkel claims.

  • recode them to utf8, and output also utf8. My compiler can cope with that, and still produce binary files in atari encoding that can be viewed with ST-Guide (unless you use characters that can't be encoded in atarist encoding of course). But UDO might have same problems as above with tables.

  • use UDO's universal charset feature. There you use some ascii sequences for international characters, similar to TeX.

Actually, i would prefer to leave them in current encoding. It's github's fault if it cannot cope with different encodings, and it is only a matter if you try to edit it through the web interface. You can alway clone the repo and edit it locally instead.

@mikrosk
Copy link
Member Author

mikrosk commented Sep 4, 2019

Actually, i would prefer to leave them in current encoding. It's github's fault if it cannot cope with different encodings, and it is only a matter if you try to edit it through the web interface. You can alway clone the repo and edit it locally instead.

True. However I'm asking on behalf of other people who can read & spot errors but don't feel up to setting up git and branches etc. Github allows you to easily edit a file and create a PR out of it without any hassle. So that's my main motivation here.

@th-otto
Copy link
Contributor

th-otto commented Sep 4, 2019

PS: the files are encoded in atari characterset, not cp1252.

PPS.: if you want you can try one or the other option with single files. The input encoding can be changed at any time. Just make sure you set it back at the end of the file, or before inclduing any other file.

PPPS.: unconditionally transcoding the files is a no-go. That will break any source where strings are encoded in the local platforms encoding. Beside that, as mentioned above, they are not encoded in cp1252.

@mikrosk
Copy link
Member Author

mikrosk commented Sep 4, 2019

I guess Github's encoding detection doesn't count with atari encodings. ;)

unconditionally transcoding the files is a no-go. That will break any source where strings are encoded in the local platforms encoding. Beside that, as mentioned above, they are not encoded in cp1252.

I guess it's their policy - you want to have a web content hosted by us, then it must be in UTF-8, period. I don't blame them but sure, it doesn't make our atari life easier.

@th-otto
Copy link
Contributor

th-otto commented Sep 4, 2019

Edit: there would be another option: use UDO macros for all non-ascii characters, like is done

!ifdest [html,hh]
here. That way, only a single file would have non-ascii characters.

@th-otto
Copy link
Contributor

th-otto commented Sep 4, 2019

Its not only a matter of Atari. Any windows program that is not using wide unicode character strings is also affected.

@th-otto
Copy link
Contributor

th-otto commented Sep 12, 2019

I've just changed a bunch of files to use macros for the non-ascii characters. But i encountered another problem: those characters also appear in verbatim environments (listings, examples etc.), and are not replaced there. So after conversion, you carefully have to check the new output, and that is quite a lot of work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants