-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot handle certain normalized forms of unicode. Example: é (e + U+0301) vs é (U+00E9) #16
Comments
I cannot reproduce your issue. |
According to
Local to test. crypted-wrapped local is where I discovered the issue. I do not know how to do the loopback fs example but am willing to try if you show me. I did install sshfs from Maybe my assertion that this is FUSE-T and not rclone is incorrect. I created a forum post with some additional detail. Included in that is the full log where I noticed the lines:
I wonder if those are causing issues. I will wait for the rclone developers to confirm if/how to disable that config but can you also test to see if enabling them causes the problem? And help deduce if it is (a) the appropriate flag and (b) the appropriate response? |
Does FUSE-T use macOS UTF-8 NFD form for UTF-8 internally like OSXFUSE does? If not then the If not then this makes it subtly incompatible with OSXFUSE. This issue shows the NFD problem macfuse/macfuse#585 and the reason why rclone applies that If FUSE-T only shows UTF-8 NFC in its external interfaces then that is probably a good design decision, but it is different to what OSXFUSE does. |
It looks like an encoding issue, although I think FUSE-T handles it correctly as tested with sshfs and loopback fs. I will look into the issue though. |
Unicode conversion is handled by this go module: https://pkg.go.dev/unicode/utf8 |
For what it's worth, adding the I am not sure where this bug belongs at this point. Does FUSE-T treat incompatibility with OSXFUSE as a bug? Should a more modern (and awesome by the way!!!) project inherit the technical debt of its predecessor? |
Just checked and It's possible to normalize strings as UTF-8 NFD as opposed to NFC, but I'm not sure it should be a default option. macOS Finder is happy with whatever encoding I throw at it. I can add a mount flag "-o utf8-nfd", would that be helpful? |
I think OSXFUSE needs to have the NFD form as that is what the kernel interfaces of macOS are expecting. I'm guessing since FUSE-T plugs into the NFS layer that the NFS layer is dealing with the NFC to NFD translation for you?
This should be the default if you want 100% compatibility with OSXFUSE I think. I'm not sure that is a good idea though as NFD encoding is a pain to deal with. We can write that you'll need |
I don't think NFD is needed for macOS anymore, perhaps that was true for older macOS but now you can create a file containing whatever encoding and it would be shown fine. |
I don't know a great deal about this, but I found a nice explainer here: https://eclecticlight.co/2021/05/08/explainer-unicode-normalization-and-apfs/ So I think you are right for APFS but HFS+ volumes will still require NFD. I've no idea on the relative popularity of these things (not a mac user) so maybe it is irrelevant now. |
This is still an issue on macOS.
For some reason Here my test served by rclone via mount (FUSE-T) Original data NFC and NFD encoded folder and file:
This is also what I see in mount with NFC file can not be open NFD one works. You can see also that NFC and NFD have different icons - NFC one is generic for When I try This is a bit of surprise as I would expect that this option should produce consistent NFD as in the example below:
Now looking for possible solution I was thinking that:
FUSE-T uses NFS so maybe issues comes from this problem. Would it be possible to add this as optional mount flag "-o nfc"? With these new flags I could try to find working rclone/fuse-t solution. |
I can definitely add "-o nfc" mount flag. Would that be enough to make rclone working or nfc-nfd/nfd-nfc conversion also needed? |
It would be great if possible - I think that really needed are: "-o nfc" And this would be nice to have: "-o utf8-nfc" |
I'm a bit confused. There are two peers: macos and user So there would be four cases:
Is this correct analysis? |
Yes your logic is correct and I wish it works like that - however reality seems to be different and I do not have all answers yet. In theory The new flags would be different approach to the old problem - which I hope we can at last tackle completely. We might have to make some changes on rclone side as well - I am not sure about it yet. |
And where it gets really confusing is that with no conversion all files are accessible in bash/zsh - it is similar to the issue with NFS I related earlier to. |
Ok, let's start with "-o nfc" option for now. Later I will add more options if needed. "-o nfs" will just passthrough characters between both ends. |
Other question is how you can programmatically distinguish between OSXFUSE and FUSE-T? OSXFUSE will slowly become history but for now still a lot of people use it - ideally rclone could detect what is used and apply correct options. |
Thank you for adding Now only problem is with NFD files already in the filesystem. I can't assume that they will be always NFC - as users can upload them using other means than mount. For this I hope |
What would |
cloud ---(NFC,NFD )---> rclone ---(NFC,NFD )---> fuse-t ---(NFD) ---> macOS |
Afraid it won't work. You can't mix both NFC and NFD. With -o nfc flag macos converts everything to NFC meaning fuse-t won't know if the original file is NFC or NFD. What can be done in case -o nfc is not specified is to convert the rclone side to NFD. |
Thanks for your help. Indeed now there are two options - with But still if possible and easy to add I think |
I am using FUSE-T with rclone. I am, however, 99% sure this is a FUSE-T issue and not rclone.
I have a file with the following character:
é
. That ise + U+0301
or UTF8 encoded:e\xcc\x81
. This consistently breaks listing the directory. If I change it to the normalized formé
which is (U+00E9) or\xc3\xa9
in UTF8, it works fine.Rclone handles the name just fine in listing and I even see it in the rclone logs. I have the following files to demonstrate (I added around it to see if anything listed first):
Source Dir:
Mount Dir:
The rclone
-vv
log is: rclog.log. This seems to indicate rclone isn't having the issue but rather FUSE-T.Are there other tests I can do to assists? Additional logging? It is 100% reproducible on my machine so just let me know.
The text was updated successfully, but these errors were encountered: