-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bom, encoding and test EXPath-file-writeText3-002 #70
Comments
Also EXPath-file-appendText3-002 |
There was email correspondence on this subject at the time, see for example https://lists.w3.org/Archives/Public/public-expath/2012Jul/0005.html This seemed to reach a level of consensus though I don't think this was well captured in the final spec. You're free of course to interpret the spec any way you like but we will achieve better interoperability between implementations if implementors respect the test suite as defining a consensus interpretation. While the relevant RFCs certainly make UTF-16 without a BOM legal, I think there is a strong presumption that the default serialization for UTF-16 should (a) be big-endian, and (b) have a BOM, and I would encourage you to follow these conventions. Michael Kay
|
There you wrote
contrary to EXPath-file-appendText3-002 Despite an remark on it
But I only wanted to deal with HTML. |
Wikipedia article on UTF-16 says If the BOM is missing, RFC 2781 https://tools.ietf.org/html/rfc2781 says that big-endian encoding should be assumed. (In practice, due to Windows using little-endian order by default, many applications similarly assume little-endian encoding by default.) So it looks as if WhatWG are playing their usual game - ignore standards, just endorse the bugs in existing products. But we're concerned here with writing of text, not reading. All the specs seem to agree that if you're writing, the most important thing is to include a BOM so that the reader knows what the endianness actually is. Michael Kay
|
Well I'm sure my message wasn't the last word on the subject but it's hard to reconstruct the decisions at this distance.
Glory be, everything WhatWG does is weird. Michael Kay |
There is also JSON. There the BOM is forbidden: https://tools.ietf.org/html/rfc7159#section-8.1
It has been a while. It seems times are changing, and newer standards have a different opinion |
Actually, not quite. It says that "implementations" shall not add a BOM. It doesn't say what an "implementation" is. With normal separation of concerns the JSON output will be written as characters, and the encoding to UTF-16 will be done by a library that has no idea that the text it is encoding is JSON, and therefore is under no obligation to conform to the JSON specification. Michael Kay |
The test EXPath-file-writeText3-002 assumes encoding utf-16 is written as big-endian with BOM.
It could just as well mean little-endian, each with or without BOM.
The text was updated successfully, but these errors were encountered: