-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Character encodings may not be respected #4
Comments
MacJapanese is closely related to SHIFT-JIS but they're not the same, see here. That said, it may be close enough to be a workable solution. I think your suggestions are all good. If the encoding could be remembered between mounts that would be great. Or maybe the user could add something to the filename that would clue fusehfs in to the required encoding? That way nothing would need to be stored on disk. I'll try manual mounting soon. |
It looks like iconv doesn't support MacJapanese but CoreFoundation does — see https://developer.apple.com/documentation/coreservices/1399915-encoding_variants_for_macjapanes. The downside of rewriting the character encoding code using CF is that it would make fusehfs less portable to Linux. I'm guessing something is stored on disk indicating encoding. Will need to investigate. |
I don't believe there is, but I don't have any references to cite. The language of the host OS is responsible for interpreting the filenames according to its default encoding. Very old school. That said, I would be interested to see what you find! |
There's a text encoding hint in the Finder Info of the Master Director Block. Maybe it's related? See There's HFS Encoding kexts for various macOS versions from Sierra to Catalina at: I think there might be source code at least for converting MacJapanese? The documentation archive has a note about an Encoding popup in the Finder's Get Info window for an HFS Standard volume: More about text encodings for HFS Plus: |
Looks like if it exists at all, it’s stored in the Finder info word in the MDB? |
Interesting! Tcl has good conversion routines and encoding tables which were written by Apple themselves. |
Following on from issue #2
Previously (hmm, I'm trying to think when exactly? a long time ago!) I could set my Mac to Japanese and reboot, mount an HFS disc that uses MacJapanese character encoding and see the filenames as intended. Reboot was essential, login was not enough.
Such foreign discs are tricky as they contain filenames in multiple character sets. Files may have been copied from other discs or downloaded from the internet, so could contain many different encodings. Encodings are not stored anywhere: they have to be set manually, calculated using heuristics or some other map, or simply assume it.
I have found that Tcl has good support for Apple encodings, most of them written by Apple themselves back in the mid-1990s when this stuff was still very much current. Though macOS should respect and display the original encoding if the characters are correct and the system language is set correctly.
Another gotcha is that bugs/omissions in Japanese input methods (helper apps that assist typing of complex such script using multiple alphabets) allowed non-displayable characters to be typed in filenames! Example: when renaming a file in Finder, pressing the Delete key on an Extended Keyboard would insert that invisible character rather than deleting anything. This means that filenames can be quite dirty and contain invalid characters, which I guess should be resolved in some way or simply ignored?
If you need more details please ask as I have the info in my notes that I can dig out.
Anecdotes
Sample HFS images
disk images that have a mix of MacRoman and MacJapanese:
This one looks 99% MacRoman, with a minimal file or two with MacJapanese names:
I have hundreds more discs of this type.
The text was updated successfully, but these errors were encountered: